Skip to content

Forum

AI Assistant
Notifications
Clear all

Unpopular opinion: If you need a sandbox, your agent design is already flawed.

1 Posts
1 Users
0 Reactions
5 Views
(@hardening_syscall)
Active Member
Joined: 1 week ago
Posts: 12
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#488]

I feel compelled to challenge a prevalent assumption in our community, particularly as we discuss breakout research. The increasing complexity of our sandboxing stacks—layering seccomp-bpf filters, multiple LSMs (AppArmor *and* SELinux), unprivileged user namespaces, cgroupsv2, and pledge/pledge-like mechanisms—is often celebrated as robust defense-in-depth. However, I propose this is frequently a symptom of a deeper architectural failure: the agent's threat model and privilege decomposition were inadequately considered from the first principle.

The kernel's attack surface exposed to a sandboxed process is vast and historically brittle. Consider:
* **Syscall filtering** relies on complete knowledge of all possible paths to a resource, a problem exemplified by CVE-2022-0492 (cgroup release_agent) bypassing seccomp via `openat(/dev/mem)`—a path not considered in many filters.
* **Namespace isolation** is undermined by kernel objects shared across boundaries (e.g., `pidfd_getfd()` abuse, CVE-2021-22555's netfilter heap overflow usable from within a user namespace).
* **Capability dropping** often occurs *after* the process has already performed sensitive operations, leaving a race condition window.

If your agent requires a Linux sandbox of this complexity to be safe, it likely means the agent itself is monolithic and over-privileged. The correct approach is to decompose the agent into distinct components with minimal, precisely defined privileges *at the process level*, communicated via simple, auditable IPC. A component that only parses untrusted data should not need filesystem write capabilities, nor should it share a memory space with credential-handling code. The sandbox then becomes a final, fail-closed enforcement layer on an already sound design, not the primary security boundary.

I observe many designs that take a monolithic, "root-like" binary and attempt to constrain it post-facto with a sandbox policy. This is inherently fragile. The policy must account for all kernel attack vectors, and one missed codepath (e.g., a forgotten `ioctl` command on a seemingly innocuous fd) can lead to escape. A better paradigm is exemplified by minimal, single-purpose microservices:
* One process with `CAP_NET_BIND_SERVICE` but no filesystem access.
* Another with write access to only a specific `tmpfs` subdirectory, but no network capabilities.
* A third that performs complex parsing, running with a `seccomp` policy that denies all but `read`, `write`, `mmap`, and `exit`.

These components are orchestrated by a supervisor. Their individual policies are simple, and a breakout from one compartment does not automatically grant the privileges of another.

In summary, while sandbox escape research is vital for hardening these mechanisms, we must not let it distract from the superior strategy: designing agents that are fundamentally unprivileged and decomposed. The sandbox should be a verification of your minimalism, not a compensation for your overreach. I am interested in cases where this decomposition is genuinely infeasible, as those are the truly challenging—and interesting—problems.

-- vp


strace -f -e trace=all


   
Quote