Alright, let's talk about cutting network access for a container without relying on the container runtime's janky network flags or praying to the Namespace gods. Everyone reaches for `--network none`, but what if you still need that sweet, sweet Unix socket for a local socket file? The container runtime's idea of security is often a suggestion, not a rule.
I wanted something that actually drops the syscalls for anything that isn't a Unix domain socket at the kernel level, before any packet ever gets a chance to exist. Seccomp is the only way to get that guarantee. You can't just block `socket()` entirely, because that's how your AF_UNIX gets made. You have to inspect the arguments.
Here's a raw BPF-style seccomp profile that does exactly that. It's for `libseccomp`, so you can feed it to Docker's `--security-opt seccomp=`, or use it with `syscall` in a Kubernetes pod spec, or just wrap your binary with `scmp_bpf_load`. Works on my Pi 4 and my decrepit Xeon box.
```json
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": [
"SCMP_ARCH_X86_64",
"SCMP_ARCH_AARCH64"
],
"syscalls": [
{
"names": [
"socket"
],
"action": "SCMP_ACT_ALLOW",
"args": [
{
"index": 0,
"value": 1,
"valueTwo": 0,
"op": "SCMP_CMP_EQ"
}
]
}
]
}
```
The key is that single `args` block. Index 0 is the `domain` argument. AF_UNIX is 1 on both x86_64 and AARCH64 (check your `/usr/include/` if you don't believe me). So this says: allow the `socket` syscall *only* if the first argument equals 1. Any other call—AF_INET (2), AF_INET6 (10), whatever—hits the default action of `SCMP_ACT_ERRNO` and fails with EPERM.
Now, the obvious hole: `socketcall` on x86. That's the old multiplexing syscall. If you're running on a kernel with `CONFIG_COMPAT` and some ancient binary, you'd need to block that too. Modern container workloads won't use it, but if you're paranoid, add a rule for it with similar argument inspection, or just blanket deny it.
This is more surgical than `--network none` because it still lets the container talk to a Unix socket you might mount in. Useful for sidecar-less service mesh wannabes, or locking down a database container that only needs a local admin socket. Tighter than a container runtime's promises.
Good approach, but you've posted an incomplete JSON block. That'll break if someone copies it. Also, `socket` isn't the only way to make a network socket.
You also need to handle `socketcall` on some older arches, and `socketpair` (though it defaults to AF_UNIX). Might want to consider `accept` and `accept4` as well - a network socket could be passed in via fd.
The core idea is right, though. Kernel-level guarantee beats runtime flags every time.
--Priya
Oh, that's a fantastic point about `socketpair` and `accept`! I was so focused on the creation path, I totally spaced on a socket being passed in via an inherited or dup'd file descriptor. That would completely bypass the filter.
You're also right about `socketcall` - it's still a thing on 32-bit ARM (armhf) if anyone's running older Pis or containers. That arch would slip right through this filter.
For anyone trying this, you'd also need to think about `connect`, `bind`, `listen`, and `sendto`/`recvfrom` on a pre-existing network fd. It gets messy fast. Maybe the simpler guarantee is to block *all* socket-related syscalls and then make a single, explicit exception for `socket()` only when `domain == AF_UNIX`. That at least contains the blast radius.
Love the direction though. Kernel-level or bust.
lab.firstname.net