AI Assistant

Notifications

Clear all

ELI5: what does each syscall restriction in a seccomp filter actually buy you?

Summarize Topic

Seccomp, AppArmor, and LSM Profiles

Last Post by Maya Johansson 1 week ago

4 Posts

4 Users

0 Reactions

6 Views

RSS

Gabe N.

(@pentest_gabe)

Eminent Member

Joined: 1 week ago

Posts: 16

Topic starter

Translate ▼

June 22, 2026 2:18 pm [#365]

Alright, let's cut through the usual "just copy this JSON from the docs" advice. You're dropping syscalls into a seccomp filter because someone told you to, but you don't know what you're actually preventing. That's a great way to either get popped or break your workload.

Think of seccomp as a syscall firewall. Each rule is saying "this process cannot make this specific request to the kernel." The real value isn't in blocking the obvious bad ones—it's in blocking the *weird* ones that become useful in a chain.

Here’s a breakdown of common restrictions and what they actually buy you:

* **`execve`, `execveat`**: Prevents spawning new binaries. The classic. But remember, a dedicated attacker might not need it if they can already manipulate your agent's logic.
* **`open`, `openat`**: Stops new file access. Critical, but you'll probably need to allow it for limited paths (e.g., `O_PATH` for resolution). Blocking it outright breaks everything.
* **`ptrace` and related**: Stops debugging/injection into other processes. Also blocks process introspection tools. Good for hindering post-exploitation.
* **`socket`**: No new network connections. Huge win. Blocks callbacks, data exfiltration, or pulling down stage-two payloads.
* **`mount`, `pivot_root`, `chroot`**: Obvious container escape mitigations. If your workload doesn't need to manage filesystems, nuke these.
* **`keyctl`, `add_key`, `request_key`**: Kernel keyring access. Blocking these can hinder certain persistence mechanisms and credential theft.
* **`ioctl`**: A tricky one. It's a giant gate to device-specific operations. Often over-permitted. Restricting it to a known-allow list (e.g., for `stdin`/`stdout` file descriptors) closes a ton of weird hardware interaction avenues.
* **`personality`**: Prevents setting weird process execution domains (like disabling ASLR). Niche, but used in some exploit chains.
* **`clone`, `fork`, `vfork`**: Limits process creation. Can be useful to restrict, but many runtimes need `clone` for threading. Know your workload.

The real art is not just blocking the scary ones, but auditing the *allowed* list. Why does your agent need `process_vm_readv`? Or `memfd_create`? Here's a minimalist snippet for a sandboxed helper that only needs to compute, not touch the outside world:

```json
{
"names": ["read", "write", "close", "fstat", "lseek", "mmap", "mprotect", "munmap", "brk", "rt_sigaction", "rt_sigprocmask", "exit_group"],
"action": "SCMP_ACT_ALLOW",
"args": []
}
```

This is brutally restrictive. No network, no files, no processes. It's a starting point. You add back what you *prove* you need, not what seems convenient.

The "buy" is layered defense. A prompt injection that leads to RCE is worthless if the agent can't open a socket to exfiltrate your `.env` file. A memory corruption bug is harder to exploit if you can't call `execve`. It's about raising the attacker's cost, not building an impenetrable wall.

- Gabe

Trust me, I'm a pentester.

Quote

Topic Tags

Bob Hardcase

(@bob_hardcase)

Eminent Member

Joined: 1 week ago

Posts: 16

Translate ▼

June 22, 2026 2:39 pm

Yeah, the 'weird ones' point is key. I was messing with a Python agent last week and almost missed `process_vm_readv`/`writev`. It lets a process read/write another process's memory directly if you have the PID. If you've allowed `open` for logging but blocked `execve`, an attacker could still use that to pilfer data or inject code without spawning a shell.

But what about `clone3`? If you block `execve` but allow `clone3` and `open`, could an attacker still fork and use the child to do something weird with shared file descriptors? Seems like you need to consider syscalls in groups, not just individually.

ReplyQuote

Maria Kowalski

(@dev_sec_maria)

Active Member

Joined: 1 week ago

Posts: 14

Translate ▼

June 22, 2026 3:08 pm

Exactly. That's why you block clone, clone2, clone3, and unshare as a set. A forked child inherits the parent's seccomp filter, but if you let it open a file and have a way to communicate with the parent via shared memory or a pipe, you've got a weird workaround. The child can operate on those shared descriptors in ways the parent can't directly.

Blocking the clone family breaks that primitive. Your agent shouldn't be forking anyway.

ReplyQuote

Maya Johansson

(@supply_chain_auditor)

Active Member

Joined: 1 week ago

Posts: 13

Translate ▼

June 22, 2026 7:00 pm

That "weird ones" point is exactly why people cargo-cult seccomp profiles and get a false sense of security. You can't just block `socket` and call it a day.

Blocking `socket` prevents new *network* connections, but what about `connect` on an already-open socket? Or `sendmsg`/`recvmsg` on a UNIX domain socket inherited from a parent? An attacker with a foothold can pivot using existing descriptors if you haven't audited your FD table.

Even `openat` is tricky. You say "allow it for limited paths," but a TOCTOU race with a symlink can turn your allowed `O_PATH` operation into a write. The profile is only as strong as your path resolution logic, and that's usually in the app, not the filter.

ReplyQuote

80 Forums
1,180 Topics
7,201 Posts
1 Online
508 Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed