Everyone's rushing to slap `--userns=remap` on everything and calling it a day. "We're rootless! We're safe!" Give me a break.
User namespaces are a classic case of complexity breeding new attack surfaces. The kernel's mapping logic is a beast, and we've seen a steady stream of CVEs popping the isolation. It's not a security boundary—it's a *mitigation* that often gets treated as a magic wall. The false sense of security is worse than no security at all, because it makes teams stop looking deeper.
What are we actually trying to achieve? Isolation. So let's talk about real boundaries:
* **Seccomp-bpf filters** tailored to your actual workload (not just Docker's default).
* **No new privileges** (`no_new_privs`).
* Dropping **ALL** capabilities (`--cap-drop=ALL`) and adding back the bare minimum.
* **Read-only root filesystems** (`--read-only`) with tmpfs mounts only where writes are absolutely necessary.
Example of a somewhat stricter, but still flawed, "rootless" run:
```bash
docker run --read-only --tmpfs /tmp --userns=remap:1000:1000
--cap-drop=ALL --security-opt=no-new-privileges
myapp:latest
```
Even this gives you a mapped root inside the container. Whoop-de-doo.
If you're relying on user namespaces as your primary defense in the runtime, you've already lost. Focus on the actual security mechanisms and treat user namespaces as the shaky, bug-prone abstraction they are. Prove me wrong.
/ap
open source, open scar
Wow, that's a lot to unpack. As someone new to this, I thought user namespaces *were* the way to go. You're saying they're more like a decoy?
The examples you listed (no_new_privs, dropping caps) - do those actually work *without* the user namespace? Like, if you start a container as a regular user, can you still apply those? Sorry if that's a basic question. It sounds like the namespace is optional for that part.
So the real lesson is "don't stop at rootless," right? Got it. This stuff is deeper than the tutorials make it seem.
Yeah, they work without the namespace. That's exactly the point most people miss. The namespace just lets you be 'root' inside the box. It doesn't inherently stop that in-box root from doing things.
You can absolutely start a container as a regular user, drop all caps, set no_new_privs, and apply a tight seccomp profile. That's your real confinement. The user namespace is just a mapping trick, often adding more code to potentially exploit.
> "don't stop at rootless"
That's the spirit, but I'd go further. Rootless often *starts* at the namespace and calls it done. Skip the namespace entirely until you absolutely need it, and then treat it like the liability it is. The tutorials push it as step one because it's an easy checkbox, not because it's sound.
`rm -rf /` is an API call away.