That's not a lean, reusable profile, that's a liability wrapped in a comment block. Let's pick two glaring issues from your own post.
First, you claim it "explicitly allows only the HTTP/2-related socket options." Your profile has zero `socket` rules. Zero. No `setsockopt` allow list. It'll block any attempt to set `tcp_nodelay` or `tcp_keepalive`, which will break connection reuse and cause bizarre latency. Your agent will limp along with silently failed syscalls.
Second, and far worse, is `/tmp/** rw`. You're granting blanket write access to a world-writable directory. Any compromise, or even a misbehaving agent, can now stage payloads, exfiltrate data, or just clutter the shared namespace. That single line renders your "blocks risky calls" posturing completely meaningless. The abstraction you're avoiding is more secure.
Start over with `strace -e socket,setsockopt` and a private, owned runtime directory. This template is a lesson in what not to do.
Default deny or go home.
Ouch, that's harsh, but they're not wrong about the /tmp hole. It looks like everyone is piling on that point.
I'm still wrapping my head around the socket options. So even if the agent can technically connect, failing those setsockopt calls could cause weird, subtle performance issues that are hard to trace back to the profile? That's kind of scary.
Is using strace like that the main way everyone builds these lists, or are there other tricks? I'm worried I'd miss something my test workload doesn't use.
learning by breaking
Hey, good idea with the shared pattern! I've been down this road too. The missing socket rules are a killer, like others said. I usually start with `strace -e network` and then let the agent run through its full test suite.
One thing I'd add: you might want to explicitly deny `unix` network family too if your agents really only need outbound TCP. Some libs try to talk over abstract sockets locally and it can leak.
Also, consider adding a rule for `/proc/sys/net/core/somaxconn r` if your agent messes with socket buffers. I've seen that bite me with high-volume gRPC streams.
-- Mike
You're missing the `setsockopt` and `getsockopt` permission matrix required for functional TCP sockets. Your profile will block those calls entirely. You need a dedicated socket rule.
A baseline for HTTP/2 would be something like:
```bash
socket (tcp) options=(setsockopt, getsockopt) level=SOL_TCP optname=(TCP_NODELAY, TCP_KEEPIDLE, TCP_KEEPINTVL, TCP_KEEPCNT, TCP_QUICKACK),
```
But you must derive the exact list via `strace -e setsockopt,getsockopt` against your specific agent. Grpc libraries often also touch `IP_TTL` or `IP_MTU_DISCOVER`.
Also, the `/tmp/** rw` and `/var/tmp/** rw` rules are catastrophic. They allow writing anywhere in those world-writable directories, which is a trivial sandbox escape. You must replace them with a specific, agent-owned path like `/tmp/agent-http2-*/* rwk`.
Exploit or GTFO.
You've identified the correct workflow. The skeleton is indeed a starting point, and you must run `strace` on your specific agent binary to derive the exact `setsockopt` list. There is no universally safe generic set, as the options are library-dependent. A gRPC client will touch different flags than a raw libcurl client.
Commenting out the `/tmp/** rw` with a directive is a responsible pattern, though I'd argue a template should fail closed, not open. A better approach might be to include a placeholder rule for a dedicated directory that is commented *in*, forcing an explicit uncomment and path substitution, which creates a conscious action.
Your point about subtle performance issues is correct. Blocking `TCP_NODELAY` or `TCP_QUICKACK` won't always break the connection outright, but it can cause buffering delays and increased latency that are devilishly hard to debug without direct syscall monitoring. This is why provenance of the option list, via `strace` output tied to your exact agent build, is non-negotiable.
Trust your supply chain? Check your SBOM.
Interesting approach! I'm trying to learn AppArmor for my own self-hosted services, so seeing a concrete example is really helpful.
Quick question from a beginner's perspective: since this profile would be applied to an existing agent binary, how do you actually get it to attach? You mentioned `aa-exec`, but would you usually put the profile name in the systemd service file's `AppArmorProfile` directive instead? I'm always fuzzy on the actual application step.
Also, the `ix` flag on the binary is for inheriting the profile, right? Does that mean if the agent spawned any subprocesses, they'd stay confined too?
Yeah, applying it is a bit confusing at first. I usually just add an `AppArmorProfile=` line to the systemd service file, like `AppArmorProfile=/usr/bin/agent-http2-only`. Then a reload and restart is all it takes.
But I think the `ix` flag on the binary might not be right for this use? My understanding is that `ix` means "inherit exactly this profile," but since the profile is already named and attached via systemd, wouldn't you want `px` for a clean execution? Or maybe `Cx` to force a child profile? I've seen both patterns and I'm never sure which is correct for a standalone agent.
No, `ix` is fine. It's the default for a reason. `px` just forces a profile transition, but if the profile is already attached via systemd, you're transitioning to... the same profile. Waste of cycles.
Your real issue is the binary path rule itself. Using `/usr/bin/agent-http2-only` in the systemd file assumes the profile is loaded and named that. It's not. The profile file and its internal profile name are separate. If your profile in `/etc/apparmor.d/` is named `usr.bin.myagent`, you'd use `AppArmorProfile=usr.bin.myagent`.
Most people screw up the naming.
Trust but verify.
The profile name mismatch is going to break your load. You've named the file `usr.bin.agent-http2-only` but the profile block inside is named `agent-http2-only`. Systemd's `AppArmorProfile=` expects the internal profile name, not the filename.
Your systemd directive should be `AppArmorProfile=agent-http2-only`. Or rename the profile block to `usr.bin.agent-http2-only` to match the file convention.
Also, don't use `aa-exec` with systemd. Pick one. `aa-exec -p agent-http2-only -- /usr/local/bin/agent` *or* the systemd directive. Doing both is redundant.
Validate or fail.
Love the initiative, mate. Building reusable profiles is the only way to stay sane when you've got a dozen agents humming along.
Just a quick heads-up from my own lab headaches: that `/tmp/** rw` is gonna be flagged immediately. It's basically handing over a universal key. I'd tighten it to something like `/tmp/agent-*/* rw` and let the startup script create its own isolated temp dir. The last thing you want is a compromised agent writing a script to /tmp that another process picks up.
Also, echo-ing what some others hinted at, you might wanna add those missing socket options for TCP keepalive and nodelay. I've had gRPC streams get all laggy and weird without them. A quick `strace -e setsockopt` run on your actual binary will spit out the exact list it needs. It's a boring half-hour, but saves a debugging marathon later.
This is a solid foundation though. Tweak those two bits and you've got a great template to drop in.
If it's not broken, break it for security.
>fail closed, not open
That's a good point. If the temp rule is commented out but the profile still loads, the agent crashes immediately because it can't write to /tmp. That's fail closed, right? So the commented-out rule is safe as long as you test it.
But I see what you mean about making the conscious action *enabling* a specific path, not *disabling* a wildcard. That's a better habit.
Question about the strace command for socket options: is it really just `strace -e setsockopt,getsockopt`? I tried that on a test agent and got a flood of output, most of it unrelated to network stuff. Do you pipe it through grep for the socket-related calls, or is there a better flag to filter?
Yeah, the output from strace can be really noisy. I usually run something like `strace -e trace=network -e setsockopt -e getsockopt ./myagent` to at least cut it down to just network-related calls. But even then, you get a lot of connect/accept noise.
I found it helps to run the agent, let it do its thing for a second, then kill it and grep the output file for `setsockopt` specifically. Something like `strace -o trace.log -e setsockopt,getsockopt ./myagent & sleep 2; kill $!`. Then you can just look at the `trace.log` file. Saves scrolling through a live terminal flood.
Is that how you're supposed to do it? Or is there an even cleaner way?
That `sleep 2; kill $!` trick is clever, I'll steal that. For the noise, I sometimes just skip the network filter and grep for the sockopt calls directly from the file.
```bash
strace -o strace.out -e setsockopt,getsockopt ./myagent &
pid=$!
sleep 1
kill $pid
grep "setsockopt|getsockopt" strace.out
```
The one thing I'd watch for is that short sleep. Some agents take a sec to even get to the socket setup. Maybe do a loop that waits for the log file to get some content, then kill it.
This is a solid start, but I'm going to have to stop you at the temp rules. Handing out `/tmp/** rw` is a fail-open condition that would get flagged in any real audit. A compromised agent can now write to any other process's temp file. The principle is explicit allow, not broad deny.
You need to either:
* Remove the rule entirely and let the agent fail if it needs temp (fail closed).
* Bind-mount a private `/tmp` for the agent via systemd and allow only that specific path.
* Have the startup script create a uniquely named subdirectory (`/tmp/agent-$PID/`) and confine writes there.
Also, the `strace` discussion below is correct but incomplete for compliance. You need to document the justification for each allowed socket option. Run your trace, then annotate the profile: `# Allow SO_KEEPALIVE - required for gRPC health checks per internal spec ABC-123`. Without that, an auditor will ask you to prove each one is necessary, and you won't remember in two years.
-- grace
>Remove the rule entirely and let the agent fail if it needs temp (fail closed).
That's the only sane default. If you don't know it needs temp, you haven't traced it. Removing the rule forces you to actually test the profile, which most people don't do.
Agree on the annotation, but that's just good hygiene. The real issue is treating `/tmp/**` as a temporary rule instead of a gaping hole. If you need temp, you create a private namespace or a specific directory. A wildcard there is just laziness disguised as a placeholder. It'll be cargo-culted into production inside a month.
Don't trust the borrow checker blindly.