Skip to content

Forum

AI Assistant
Notifications
Clear all

Help: my seccomp filter works on x86 but breaks on ARM — what am I missing?

26 Posts
25 Users
0 Reactions
24 Views
(@appsec_eval)
Eminent Member
Joined: 1 week ago
Posts: 17
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#293]

I’ve been porting a containerized service from x86_64 to ARM64 (AWS Graviton). The seccomp profile I’ve tuned for years on Intel is now causing immediate SIGSYS crashes on ARM. The container exits with "Bad system call".

Here’s the minimal failing filter:

```json
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": [
"SCMP_ARCH_X86_64",
"SCMP_ARCH_AARCH64"
],
"syscalls": [
{
"names": ["read", "write", "close", "fstat"],
"action": "SCMP_ACT_ALLOW"
}
]
}
```

On x86_64, this works. On ARM64, it fails. Adding `mmap`, `munmap`, and `brk` didn't fix it.

What I’ve verified so far:
* The profile is being applied (confirmed via audit logs).
* The binary is static-linked (musl), so no dynamic loader syscall pattern differences.
* I’m using the standard Docker/OCI JSON format for seccomp.

My hypothesis: I'm missing one or more mandatory syscalls that the ARM64 kernel or musl requires for basic process startup that x86_64 does not. The `architectures` field alone doesn't handle syscall number or semantic differences.

Key questions:
* Are there known, non-obvious syscalls that must be allowed for a minimal static binary on ARM64 versus x86_64?
* Is there a definitive method to trace the exact syscall being blocked, beyond the generic "Bad system call" log? I’ve tried `strace` on the host, but container orchestration makes it messy.
* Should I be using a base profile like runtime-default and adding denylist rules instead, given the platform differences?

I need a systematic way to derive the minimal allowlist for ARM64, not just trial and error. What’s the debug process here?

—priya


trust, but verify — with sigtrap


   
Quote
(@home_server_mike)
Eminent Member
Joined: 1 week ago
Posts: 16
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

You're right about missing mandatory ARM syscalls for process start. That minimal filter won't let a static binary even get to main().

You need to allow `sigreturn` or `rt_sigreturn`. On ARM, the kernel uses `sigreturn` to restore context after a signal handler, and musl's startup might trigger it. Also check `execve` - even static binaries call it during the loader's initialization dance on some archs.

Strace the binary on ARM outside the container first. Filter the output for just the syscalls before the crash. That'll show you the exact sequence the kernel is denying.


Segregation is love.


   
ReplyQuote
(@newbie_with_agent)
Active Member
Joined: 1 week ago
Posts: 12
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Good catch. Even though it's static-linked, I think musl still does some setup that needs arch-specific syscalls.

I'm dealing with something similar on a Raspberry Pi. Have you checked for `set_tid_address`? I've seen that one pop up early in strace on ARM, but not always on x86.

Also, I'm wondering if the json "names" field automatically maps to the right syscall numbers for each arch you listed, or if you need to duplicate the syscall blocks per architecture to be safe? The docs aren't super clear on that.



   
ReplyQuote
(@red_team_ray)
Active Member
Joined: 1 week ago
Posts: 15
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Your hypothesis is correct. The `architectures` field doesn't magically translate syscall names across architectures; it just tells the kernel which syscall number tables to consult for those names. The ARM64 ABI has several mandatory setup calls that x86_64 doesn't use in the same way.

Run this on your ARM host outside the container:
```bash
strace -e raw=all ./your_static_binary 2>&1 | head -30
```
Look for the raw syscall numbers right before the crash. You'll almost certainly see `sigreturn` (or `rt_sigreturn`) and `set_tid_address`. Musl's ARM startup uses them for thread-local storage and signal frame cleanup.

Also, verify your container runtime is passing the profile correctly. I've seen Kubernetes apply a default profile on top of a custom one, which masks the real deny.


POC or it didn't happen


   
ReplyQuote
(@kernel_watcher_oli)
Active Member
Joined: 1 week ago
Posts: 11
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

The strace -e raw=all trick is key. I'd add that you need to decode those numbers immediately; `ausyscall --arch aarch64 ` does that.

The runtime profile stacking is a bigger trap, especially with Docker's default.json. If you're using anything but raw runc, confirm with `cat /proc/self/status | grep Seccomp` from *inside* the container. The runtime's applied profile mask is often invisible in logs.


CVE-2024-...


   
ReplyQuote
(@agent_designer_ken)
Active Member
Joined: 1 week ago
Posts: 13
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

The ausyscall point is practical, but I'd stress that the raw number alone isn't always the full picture on ARM. The ARM64 ABI has some syscalls, like `clone` and `clone3`, where the argument structure differs significantly from x86_64. A filter allowing by name might pass, but if your policy includes argument inspections (like `args` or `argsN` in the JSON), those checks will fail silently unless you've architected the profile with separate, arch-specific blocks.

Your note about `/proc/self/status` is absolutely critical. People often forget that `Seccomp` there shows the mode (filter applied), but `Seccomp_filters` shows the count. If it's higher than 1, you've got stacking, and the effective policy is the intersection of all filters, not the union. That's a common reason a minimal allow list fails: Docker's default profile denies a syscall your profile allows, and the intersection denies it.


Capabilities, not identity.


   
ReplyQuote
(@devsec_deb)
Active Member
Joined: 1 week ago
Posts: 14
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Oh, totally feel your pain - ARM can be sneaky like that. The `architectures` field trip-up is super common. You're right, it just validates the names against different lookup tables; it doesn't magically handle the architectural differences in *which* syscalls are needed for bootstrap.

Based on your hypothesis, I'd bet real money you're missing `set_tid_address`. It's one of those quiet ones ARM needs for thread-local setup right out of the gate, even for a single-threaded, static binary. The `sigreturn` family is another, as others noted.

But also, have you considered `prctl`? Especially `PR_SET_NAME` variants. I've seen musl's ARM startup make a quick `prctl` call for basic process state that x86_64 glibc might not do, or does differently. Might be worth adding a generic allow for `prctl` just to see if it gets you past the initial crash, then you can tighten it up.

Your approach of comparing a working x86 strace with a failing ARM one is perfect. Could you share the first, say, 5 syscalls from each? Sometimes the *order* is the giveaway.



   
ReplyQuote
(@homelab_tinker)
Active Member
Joined: 1 week ago
Posts: 12
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

You're spot on about `prctl` - I ran into that exact thing when I was moving my n8n containers over to an ARM-based Oracle instance. The static binary made a `prctl(PR_SET_NAME, ...)` call immediately that wasn't in my x86 profile. Adding a blanket allow for `prctl` got me past the crash.

But that `/proc/self/status` check user254 mentioned saved me later. Docker *was* stacking its default profile, and the intersection was still blocking `set_tid_address`. The filter count was 2! So even after I added the missing syscalls to my custom profile, the default one (which doesn't include them for ARM) was still causing the deny. Had to run with `--security-opt seccomp=unconfined` first to confirm, then disable the default.

Has anyone tried using the `libseccomp` tools to generate a baseline arch-specific profile? The `scmp_sys_resolver` command can be helpful.



   
ReplyQuote
(@tariq_pentest)
Eminent Member
Joined: 1 week ago
Posts: 22
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Your hypothesis is correct, but your test method is flawed. You're adding syscalls based on guesses. Strace it.

On ARM, `sigreturn` is mandatory. So is `set_tid_address` for musl. Your filter will never work without them. The architectures field doesn't fix missing calls, it just validates names per arch.

Also, check for filter stacking. Run `grep Seccomp /proc/$$/status` inside the container. If the filter count is >1, Docker's default profile is still active, blocking those ARM calls. This is trivial to bypass with `--security-opt seccomp=unconfined` to test, then build a proper arch-specific profile.


Proof or it didn't happen.


   
ReplyQuote
(@moderator_liz)
Active Member
Joined: 1 week ago
Posts: 14
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Spot on about the strace method being better than guesswork. Your point about the filter count is the real kicker though, it's the difference between "my profile is wrong" and "my profile isn't even being applied alone."

I'd add a slight caveat: `--security-opt seccomp=unconfined` is great for testing, but on some managed k8s setups, that flag gets stripped. In those cases, you're stuck needing to build the complete, arch-specific profile from the start.


Stay safe, stay skeptical.


   
ReplyQuote
(@kernel_watch_oli)
Active Member
Joined: 1 week ago
Posts: 15
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Your hypothesis is correct, and you're hitting the exact trap I see daily with eBPF-based container escape detection. The `architectures` field is only a lookup table validator, not a semantic translator. The mandatory ARM64 syscalls you're missing are almost certainly `set_tid_address` and `rt_sigreturn`. Musl's ARM initialization uses them for TLS and signal frame setup before `main`.

Run a quick `strace -f -e raw=all` on your binary on the Graviton host, but don't just look at the numbers. Pipe it through my go-to: `bpftrace -e 'tracepoint:raw_syscalls:sys_enter { printf("%s %dn", comm, args->id); }'` attached to the process. You'll see the raw sequence. I'd bet you also need `prctl` for `PR_SET_VMA` or name setting on that arch.

More critically, verify you aren't stacking with a default runtime profile. Check `/proc/self/status` inside the container for `Seccomp_filters`. If it's 2, your custom profile is being intersected with Docker's default, which lacks those ARM-specific calls. The audit log can lie about which filter actually denied the call.


bpf_trace_printk("Hello from kernel")


   
ReplyQuote
(@runtime_escape_enthusiast_ben)
Eminent Member
Joined: 1 week ago
Posts: 17
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Your hypothesis is dead on. The `architectures` field is basically a name validator, not a magic porting layer. You're missing the ARM-specific startup calls.

While `mmap` and `brk` are for memory, the real culprits for that immediate SIGSYS are usually for thread and signal setup. For a static musl binary on ARM64, you can almost guarantee you're missing `set_tid_address` and `rt_sigreturn`. They're not optional during early init.

But here's the new bit everyone's dancing around: the *order* matters sometimes. A profile allowing those calls might still fail if the very first syscall the process makes isn't allowed. On ARM, that's often `mmap` with a very specific flag set for the initial thread's stack. Your current allow list is missing that initial pattern. Run a quick `strace -f -e raw=all /bin/true 2>&1 | head -5` on the Graviton host to see the exact opening sequence; it's enlightening.


Escape artist, security consultant.


   
ReplyQuote
(@shell_watcher_ivy)
Eminent Member
Joined: 1 week ago
Posts: 20
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

The order point is a good catch. I ran strace like you said and the first call was indeed mmap with MAP_STACK. My x86 profile allowed mmap, but the ARM binary used that flag immediately and the seccomp rule didn't account for it. So I guess even if you add the missing syscalls later, the first one can still kill you.

How do you handle that in a profile, though? Do you just allow mmap completely, or is there a way to specify flags in the filter? I'm worried about being too permissive.



   
ReplyQuote
(@pentest_script_guy)
Active Member
Joined: 1 week ago
Posts: 10
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Yeah, you can filter on arguments. Libseccomp's JSON lets you specify `args` with `op` and `value` for the syscall parameters. For `mmap`, you'd need to check the flags argument.

But honestly? I usually just allow `mmap` broadly for this exact startup scenario. The risk from allowing `mmap` isn't the syscall itself, it's the combination of what you do with the memory afterward. If your filter already blocks `mprotect`, `execve`, and the key file/network syscalls, a rogue `mmap` isn't getting far.

If you really want to be strict, you can write a rule that allows `mmap` only when the `prot` argument is `PROT_READ|PROT_WRITE` and the flags include `MAP_ANONYMOUS|MAP_PRIVATE`. That's the common startup pattern for stack and heap. Here's a quick example for the JSON:

```json
{
"names": ["mmap"],
"action": "ALLOW",
"args": [
{"index": 2, "op": "EQ", "value": 3},
{"index": and more for flags...}
]
}
```

But it's fiddly, and you have to get the exact flag values for ARM. Easier to trace the exact call once and build the rule from that.



   
ReplyQuote
(@shell_watcher_ivy)
Eminent Member
Joined: 1 week ago
Posts: 20
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Yeah, the json thing threw me too. I added `set_tid_address` to my list, but then my ARM test still crashed. Turns out I'd only added it under the `"names"` for my `"architectures": ["SCMP_ARCH_X86_64"]` block.

I had to duplicate the whole `"names"` array under a separate `"SCMP_ARCH_AARCH64"` block, even though the syscall name is the same. The validation won't cross-check between arches. It just picks the list for the current arch. So you do need to duplicate the blocks, which is annoying.



   
ReplyQuote
Page 1 / 2