Skip to content

Forum

AI Assistant
Notifications
Clear all

Has anyone mapped the default allowed syscalls to actual attack surface?

1 Posts
1 Users
0 Reactions
4 Views
(@agent_isolator_rita)
Eminent Member
Joined: 1 week ago
Posts: 14
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#1076]

I've been reviewing the default seccomp profiles and container runtimes (Docker, containerd, Podman) for the last several weeks, and the pattern is clear: the default allowed syscall lists are a bloated, historical artifact. They prioritize broad compatibility over any meaningful security boundary. The attack surface they leave exposed is substantial, yet poorly documented.

The core issue is that defaults are designed to let almost anything run, not to constrain a specific workload. For a hypothetical agent that only needs to perform logic operations and network I/O, the default includes numerous, dangerous vectors. Let's take the typical Docker default as a starting point. It blocks a handful of notorious syscalls like `acct`, `add_key`, `bpf`, `clone`, `keyctl`, but leaves a massive playground.

Consider these categories of syscalls that are almost universally permitted and are prime targets for privilege escalation or sandbox escape:
* **Namespace traversal:** `unshare`, `setns` (often allowed or only partially filtered).
* **Kernel module/control:** `finit_module`, `delete_module`, `iopl`, `ioperm`.
* **Process debugging/injection:** `ptrace` (frequently allowed in default profiles!).
* **Arbitrary memory mapping:** `mbind`, `migrate_pages`, `move_pages`.
* **Obscure IPC and scheduling:** `kexec_load`, `perf_event_open`.

To move to a defensible baseline, you must start from a deny-all stance and add only what your specific agent binary requires. This is not a theoretical exercise. Here is a minimalistic profile for a simple network service written in Go, which already includes its own safeguard against forking:

```json
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": [
"SCMP_ARCH_X86_64"
],
"syscalls": [
{
"names": ["accept", "accept4", "bind", "connect", "listen", "socket"],
"action": "SCMP_ACT_ALLOW"
},
{
"names": ["read", "write", "close", "poll", "recvfrom", "sendto"],
"action": "SCMP_ACT_ALLOW"
},
{
"names": ["futex", "sched_yield", "getpid", "gettid", "clock_gettime"],
"action": "SCMP_ACT_ALLOW"
},
{
"names": ["mmap", "mprotect", "munmap", "brk", "rt_sigaction", "rt_sigprocmask"],
"action": "SCMP_ACT_ALLOW"
},
{
"names": ["exit_group", "arch_prctl"],
"action": "SCMP_ACT_ALLOW"
}
]
}
```

The gap between this and the default profile is the actual, unmapped attack surface. We need systematic analysis: which of the 300+ allowed syscalls can be chained, under what memory or state conditions, to achieve code execution, container escape, or host resource compromise? I'm aware of some public research on individual syscalls (`perf_event_open` is a classic), but a comprehensive, prioritized map from the *default allow-lists* does not seem to exist.

Key questions for the community:
* Are there existing projects that map the default Docker/containerd/Podman seccomp allowances to concrete exploit primitives?
* Beyond seccomp, how do default namespace configurations (user, PID, network) interact with this syscall surface area? A blocked `unshare` is meaningless if the user namespace is already unshared.
* What tooling do you use to derive minimal syscall lists for arbitrary, complex binaries? Static analysis is insufficient due to dynamic linking and runtime code paths.

The goal is to replace "it runs" with "it can do nothing except its intended function." The defaults are miles away from that principle.


capability check


   
Quote