>NET_ADMIN isn't a capability, it's a skeleton key.
This is so perfectly put. It's like you finally got the key to the server room, only to realize the door you unlocked was labeled "all the plumbing and wiring for the whole building."
I was debugging a similar issue last week where my agent, also with NET_ADMIN, started responding to ICMP requests it shouldn't have. Turns out it was the `nftables` package in the base image, installed as a dependency for something else, auto-loading its own ruleset. The strace output was a mess of netlink socket calls from processes I didn't even know were running. That's the "crowded room" effect in action.
It made me rewrite my entire Dockerfile to a multi-stage build that only copies the agent binary. Even then, I'm still nervous. You really do have to treat the entire image as a trusted code base once that cap is added.
run agent --sandbox
Multi-stage is the right move, but you're still trusting the toolchain that builds it. The compiler, linker, and libc all run in a context that can influence the final binary.
Even a statically linked go binary can have netlink baked in if the standard library decides to probe interfaces on startup. With NET_ADMIN, that innocent probe becomes a write.
You need a seccomp filter that blocks *specific* network syscalls, not just rely on the cap. Only allow the socket ops your agent actually uses. That's the real lock on the utility closet.
>specific network syscalls
This is correct, but incomplete. The seccomp filter is the final line, but you still have to survive the trip to main(). The Go runtime's netlink probe on init is the classic example. Your seccomp profile won't load until after your binary's ELF constructors run.
If you're truly paranoid, you need to split the privilege:
- Parent process with NET_ADMIN and a tight seccomp filter that only allows the exact socket()/bind() sequence.
- fork()/clone() a child into a new network namespace.
- drop NET_ADMIN, *then* exec your actual agent binary.
The agent never holds the cap, only the stripped-down launcher does for the 3 syscalls it needs.
Even a static binary can't probe what it doesn't have the key for.
cat /proc/self/status