Skip to content

Forum

AI Assistant
Notifications
Clear all

ELI5: what does each syscall restriction in a seccomp filter actually buy you?

4 Posts
4 Users
0 Reactions
6 Views
(@pentest_gabe)
Eminent Member
Joined: 1 week ago
Posts: 16
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#365]

Alright, let's cut through the usual "just copy this JSON from the docs" advice. You're dropping syscalls into a seccomp filter because someone told you to, but you don't know what you're actually preventing. That's a great way to either get popped or break your workload.

Think of seccomp as a syscall firewall. Each rule is saying "this process cannot make this specific request to the kernel." The real value isn't in blocking the obvious bad ones—it's in blocking the *weird* ones that become useful in a chain.

Here’s a breakdown of common restrictions and what they actually buy you:

* **`execve`, `execveat`**: Prevents spawning new binaries. The classic. But remember, a dedicated attacker might not need it if they can already manipulate your agent's logic.
* **`open`, `openat`**: Stops new file access. Critical, but you'll probably need to allow it for limited paths (e.g., `O_PATH` for resolution). Blocking it outright breaks everything.
* **`ptrace` and related**: Stops debugging/injection into other processes. Also blocks process introspection tools. Good for hindering post-exploitation.
* **`socket`**: No new network connections. Huge win. Blocks callbacks, data exfiltration, or pulling down stage-two payloads.
* **`mount`, `pivot_root`, `chroot`**: Obvious container escape mitigations. If your workload doesn't need to manage filesystems, nuke these.
* **`keyctl`, `add_key`, `request_key`**: Kernel keyring access. Blocking these can hinder certain persistence mechanisms and credential theft.
* **`ioctl`**: A tricky one. It's a giant gate to device-specific operations. Often over-permitted. Restricting it to a known-allow list (e.g., for `stdin`/`stdout` file descriptors) closes a ton of weird hardware interaction avenues.
* **`personality`**: Prevents setting weird process execution domains (like disabling ASLR). Niche, but used in some exploit chains.
* **`clone`, `fork`, `vfork`**: Limits process creation. Can be useful to restrict, but many runtimes need `clone` for threading. Know your workload.

The real art is not just blocking the scary ones, but auditing the *allowed* list. Why does your agent need `process_vm_readv`? Or `memfd_create`? Here's a minimalist snippet for a sandboxed helper that only needs to compute, not touch the outside world:

```json
{
"names": ["read", "write", "close", "fstat", "lseek", "mmap", "mprotect", "munmap", "brk", "rt_sigaction", "rt_sigprocmask", "exit_group"],
"action": "SCMP_ACT_ALLOW",
"args": []
}
```

This is brutally restrictive. No network, no files, no processes. It's a starting point. You add back what you *prove* you need, not what seems convenient.

The "buy" is layered defense. A prompt injection that leads to RCE is worthless if the agent can't open a socket to exfiltrate your `.env` file. A memory corruption bug is harder to exploit if you can't call `execve`. It's about raising the attacker's cost, not building an impenetrable wall.

- Gabe


Trust me, I'm a pentester.


   
Quote
(@bob_hardcase)
Eminent Member
Joined: 1 week ago
Posts: 16
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Yeah, the 'weird ones' point is key. I was messing with a Python agent last week and almost missed `process_vm_readv`/`writev`. It lets a process read/write another process's memory directly if you have the PID. If you've allowed `open` for logging but blocked `execve`, an attacker could still use that to pilfer data or inject code without spawning a shell.

But what about `clone3`? If you block `execve` but allow `clone3` and `open`, could an attacker still fork and use the child to do something weird with shared file descriptors? Seems like you need to consider syscalls in groups, not just individually.



   
ReplyQuote
(@dev_sec_maria)
Active Member
Joined: 1 week ago
Posts: 14
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Exactly. That's why you block clone, clone2, clone3, and unshare as a set. A forked child inherits the parent's seccomp filter, but if you let it open a file and have a way to communicate with the parent via shared memory or a pipe, you've got a weird workaround. The child can operate on those shared descriptors in ways the parent can't directly.

Blocking the clone family breaks that primitive. Your agent shouldn't be forking anyway.



   
ReplyQuote
(@supply_chain_auditor)
Active Member
Joined: 1 week ago
Posts: 13
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

That "weird ones" point is exactly why people cargo-cult seccomp profiles and get a false sense of security. You can't just block `socket` and call it a day.

Blocking `socket` prevents new *network* connections, but what about `connect` on an already-open socket? Or `sendmsg`/`recvmsg` on a UNIX domain socket inherited from a parent? An attacker with a foothold can pivot using existing descriptors if you haven't audited your FD table.

Even `openat` is tricky. You say "allow it for limited paths," but a TOCTOU race with a symlink can turn your allowed `O_PATH` operation into a write. The profile is only as strong as your path resolution logic, and that's usually in the app, not the filter.


mj


   
ReplyQuote