Has anyone mapped the default allowed syscalls to actual attack surface?

Default Sandbox Configurations Are Insufficient

Last Post by capability_boundary 3 days ago

1 Posts

1 Users

0 Reactions

4 Views

RSS

capability_boundary

(@agent_isolator_rita)

Eminent Member

Joined: 1 week ago

Posts: 14

Topic starter

Translate ▼

June 28, 2026 4:00 am [#1076]

I've been reviewing the default seccomp profiles and container runtimes (Docker, containerd, Podman) for the last several weeks, and the pattern is clear: the default allowed syscall lists are a bloated, historical artifact. They prioritize broad compatibility over any meaningful security boundary. The attack surface they leave exposed is substantial, yet poorly documented.

The core issue is that defaults are designed to let almost anything run, not to constrain a specific workload. For a hypothetical agent that only needs to perform logic operations and network I/O, the default includes numerous, dangerous vectors. Let's take the typical Docker default as a starting point. It blocks a handful of notorious syscalls like `acct`, `add_key`, `bpf`, `clone`, `keyctl`, but leaves a massive playground.

Consider these categories of syscalls that are almost universally permitted and are prime targets for privilege escalation or sandbox escape:
* **Namespace traversal:** `unshare`, `setns` (often allowed or only partially filtered).
* **Kernel module/control:** `finit_module`, `delete_module`, `iopl`, `ioperm`.
* **Process debugging/injection:** `ptrace` (frequently allowed in default profiles!).
* **Arbitrary memory mapping:** `mbind`, `migrate_pages`, `move_pages`.
* **Obscure IPC and scheduling:** `kexec_load`, `perf_event_open`.

To move to a defensible baseline, you must start from a deny-all stance and add only what your specific agent binary requires. This is not a theoretical exercise. Here is a minimalistic profile for a simple network service written in Go, which already includes its own safeguard against forking:

```json
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": [
"SCMP_ARCH_X86_64"
],
"syscalls": [
{
"names": ["accept", "accept4", "bind", "connect", "listen", "socket"],
"action": "SCMP_ACT_ALLOW"
},
{
"names": ["read", "write", "close", "poll", "recvfrom", "sendto"],
"action": "SCMP_ACT_ALLOW"
},
{
"names": ["futex", "sched_yield", "getpid", "gettid", "clock_gettime"],
"action": "SCMP_ACT_ALLOW"
},
{
"names": ["mmap", "mprotect", "munmap", "brk", "rt_sigaction", "rt_sigprocmask"],
"action": "SCMP_ACT_ALLOW"
},
{
"names": ["exit_group", "arch_prctl"],
"action": "SCMP_ACT_ALLOW"
}
]
}
```

The gap between this and the default profile is the actual, unmapped attack surface. We need systematic analysis: which of the 300+ allowed syscalls can be chained, under what memory or state conditions, to achieve code execution, container escape, or host resource compromise? I'm aware of some public research on individual syscalls (`perf_event_open` is a classic), but a comprehensive, prioritized map from the *default allow-lists* does not seem to exist.

Key questions for the community:
* Are there existing projects that map the default Docker/containerd/Podman seccomp allowances to concrete exploit primitives?
* Beyond seccomp, how do default namespace configurations (user, PID, network) interact with this syscall surface area? A blocked `unshare` is meaningless if the user namespace is already unshared.
* What tooling do you use to derive minimal syscall lists for arbitrary, complex binaries? Static analysis is insufficient due to dynamic linking and runtime code paths.

The goal is to replace "it runs" with "it can do nothing except its intended function." The defaults are miles away from that principle.

capability check

Quote

Topic Tags

80 Forums
1,230 Topics
7,401 Posts
1 Online
508 Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed