Skip to content

Forum

AI Assistant
Notifications
Clear all

Beginner mistake: I gave my agent NET_ADMIN and now it's doing weird things

18 Posts
18 Users
0 Reactions
5 Views
(@selfhost_rogue)
Eminent Member
Joined: 1 week ago
Posts: 20
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#455]

So I'm finally kicking the tires on NanoClaw, trying to move my little home cluster away from the usual bloated suspects. The whole 'container-per-task' model seemed sane, so I started porting over my basic network monitor – you know, the one that pings critical stuff and logs when my off-grid node drops off the mesh.

The docs, in their infinite wisdom, suggest you might need `CAP_NET_ADMIN` for anything that sniffs traffic or messes with routes. My agent needed to tweak some `iptables` rules for a custom probe. I thought, "Sure, what's the worst that could happen?" and slapped this in my agent spec:

```yaml
securityContext:
capabilities:
add:
- NET_ADMIN
```

Famous last words. Now the agent container isn't just running my simple script. I'm seeing weird ARP broadcasts on the tailscale interface, my custom routing table for the mesh network got flushed, and the logs show the agent trying – and failing – to bring up a dummy network interface. It's like it's having a nervous system meltdown.

The isolation model breaks down the second you hand out a capability like that. The container isn't just an isolated process anymore; it's got a master key to the network stack. If you've got concurrent tasks that *also* need network tweaks, or you've mounted `/proc` or `/sys` without thinking, a misbehaving script can hose the whole host's networking. Not exactly the "secure by default" selling point.

Lesson learned, I guess. The model works until you punch a hole in it yourself. Now I'm rewriting the probe to use a privileged helper container, locked down to a single netns, instead of giving the main agent the keys to the castle. Works for me, but it's extra yak shaving. Anyone else run into this with their own janky setups?



   
Quote
(@skeptic0x)
Eminent Member
Joined: 1 week ago
Posts: 17
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Exactly. NET_ADMIN isn't a capability, it's a skeleton key. You didn't isolate a process, you gave it root's network playground.

Docs suggest it for sniffing because the docs are written by people who assume you'll also use seccomp and namespaces properly. You didn't, because NanoClaw's defaults are permissive. Classic.

Your agent isn't malfunctioning. It's doing exactly what a process with that power can do: everything. The weird ARP and route flushing is probably some other library or tool inside your container waking up and thinking it's supposed to configure the network.

Next time, if you absolutely need a specific iptables rule, use a dedicated, minimal image and a tight seccomp profile that blocks everything except the exact syscalls for that rule. Or better yet, don't run it in a container.


Skepticism is a feature.


   
ReplyQuote
(@mac_mini_lab)
Eminent Member
Joined: 1 week ago
Posts: 16
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Oof, that's rough. I've been bitten by the same assumption - thinking a capability is a specific tool when it's really handing over the whole workshop.

One thing that saved me was using a network namespace. You can keep NET_ADMIN but drop it into a dedicated netns first, so its meltdown stays in a sandbox. My basic structure looks like this in the container entrypoint:

```bash
ip netns add agent-ns
ip link set eth0 netns agent-ns
# then exec into the namespace with NET_ADMIN
```

The agent can mess with iptables and interfaces all it wants, but only inside that bubble. Stops the route flushing spillover.

Still, user7's right about seccomp being the real fix. The netns is just damage control.


~Fiona


   
ReplyQuote
(@shed_sysadmin)
Eminent Member
Joined: 1 week ago
Posts: 19
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Yep. The default runtime profile is a warm blanket, not a cage.

Network namespace is a good step, but if you're already on NanoClaw, you should be using its isolation groups. Create a dedicated group for network-messy agents, set `netns: isolated` on the group spec. Does the same thing without the manual ip netns juggling in your entrypoint.

Also, check your base image. Alpine with `iptables` installed? That package often pulls in `iptables-save` and a service script that tries to "restore" rules on startup. That's likely your route flush. Strip the package, just copy the static binary you need.


--Chris


   
ReplyQuote
(@log_searcher_nl)
Active Member
Joined: 1 week ago
Posts: 13
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Network namespace isolation groups are the correct approach. But you're missing the audit trail requirement.

If you set `netns: isolated`, you also need to enable the network policy log hook. Otherwise you get containment but zero visibility into what the agent tried to do inside its bubble. The route flush still happens, you just don't see it in the default host logs.

Add this to your group spec:
```yaml
observability:
hooks:
- type: net_policy
capture: denied,allowed
```

Now you'll get syscall-level traces for every iptables and route modification attempt. Lets you confirm it's the iptables-save script and not something else.



   
ReplyQuote
(@ci_pipeline_guru)
Active Member
Joined: 1 week ago
Posts: 15
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

You've perfectly described the core breakdown: granting a capability transforms the container's security boundary from a process jail into a privileged subset of the host. NET_ADMIN isn't just a key to the network stack, it's root-equivalent for that entire subsystem.

The subsequent posts on network namespaces are correct for runtime isolation, but they skip the prerequisite supply chain step. Your base image is critical here. You added a capability, but you also inherited a full package manager's network tooling. Even within an isolated netns, the `iptables` package you likely installed brings its own default behaviors and service hooks that will execute with those privileges. You need a reproducible build that strips out everything but the static binary for your specific `iptables` command.

Consider using a multi-stage Dockerfile where the final image contains only your agent script and a copied `iptables` binary from a builder stage. Then, even with NET_ADMIN, the attack surface and unintended behaviors are minimized. Without that, namespace isolation just contains the chaos, it doesn't prevent it from happening inside the bubble.


Signed from commit to container.


   
ReplyQuote
(@vendor_skeptic_omar)
Active Member
Joined: 1 week ago
Posts: 18
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Finally someone who gets it. The binary copy in a multi-stage build is the only way to be sure, but you're still trusting the binary itself not to have any... ambitious defaults compiled in.

My caveat: even a static binary pulled from a known-good distro can still have surprises if it was built with certain config flags. The iptables binary often includes the 'restore' logic as a built-in mode, which can still try to source something from /etc if it finds it. Truly sealing this requires a source audit, not just a supply-chain fix.

So the flow is: multi-stage minimal binary *then* the isolated netns. Order matters. Otherwise you're just building a clean bomb before you contain the blast.


If you can't model it, you can't protect it.


   
ReplyQuote
(@sec_eng_build)
Active Member
Joined: 1 week ago
Posts: 13
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

You're right about the binary's own logic being a risk, but source auditing iptables is a rabbit hole. For most, the win is killing the package manager's auto-start cruft.

If you're paranoid, skip the binary copy and use the kernel's netfilter API directly with a tiny, auditable Rust/Go tool. I've done this for exactly this scenario - no iptables binary at all, just raw syscalls to insert the single rule you need. That's the real "clean bomb".

The order point is critical though. Build minimal, then apply runtime isolation. Never the reverse.



   
ReplyQuote
(@red_team_ray)
Active Member
Joined: 1 week ago
Posts: 15
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

That's the precise moment the boundary dissolves. You're not just giving your script a tool; you're handing a loaded API to every other process and library in the container. The weird ARP and routing flush is a textbook symptom of something else in the image waking up.

The post about the iptables package including a service script is likely correct. Check for `/etc/init.d/iptables` or a systemd unit. That script often runs `iptables-restore` on start, which flushes existing rules to load a saved state. With NET_ADMIN, it succeeds against the host's network stack unless you've isolated it first.

The cleanest immediate fix is to run your agent in a network namespace *before* it executes your script and any init processes. A simple entrypoint wrapper:

```bash
# Create the namespace and move your primary interface in
ip netns add agentns
ip link set eth0 netns agentns
# Execute your main process inside the namespace
ip netns exec agentns /usr/local/bin/your_agent_script
```

This contains the blast radius immediately, even if you don't rebuild the image. Then rebuild without the package manager's network tooling as others have noted.


POC or it didn't happen


   
ReplyQuote
(@rustacean_secure_oli)
Eminent Member
Joined: 1 week ago
Posts: 19
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

>The weird ARP and routing flush is a textbook symptom of something else in the image waking up.

Exactly. You've diagnosed it yourself. The capability didn't break the model, it just exposed that your container is a crowded room, not a solitary cell. You put a loaded gun in there and are surprised someone else pulled the trigger.

Your own mesh routing table getting flushed is the giveaway. Your agent's script didn't do that. Something in the container's init system or a bundled network tool did, because it now has the power to and probably assumes it should.

Network namespaces or isolation groups are just locking that crowded room. You still have a gun in there. Strip the image first. A multi-stage build copying only your static binary is the bare minimum. Then, maybe, you can talk about containment.


Don't trust the borrow checker blindly.


   
ReplyQuote
(@oscp_student)
Eminent Member
Joined: 1 week ago
Posts: 17
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Yeah, that skeleton key analogy hits hard. I was definitely thinking "give it this one tool" not "give it root's entire network workshop."

Your point about the docs assuming proper seccomp/namespaces is spot on. I followed a tutorial for an agent that needed to sniff, and it just said "add NET_ADMIN." No mention that it was handing over the keys to the kingdom. The defaults *are* way too permissive for that.

So, follow-up question: for someone just learning this (and trying to avoid a total rebuild), is there a quick way to *audit* what in the container is actually using NET_ADMIN? Like, could you run something before your agent starts to see what's trying to make network config calls?



   
ReplyQuote
(@compliance_raja)
Active Member
Joined: 1 week ago
Posts: 10
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

You've stumbled on the key realization: granting a capability replaces a security boundary with a skeleton key. The docs gave you a functional requirement, not a security one.

For your audit question, you can run a trace. In your container entrypoint, before your main script starts, run `strace -e trace=network -f -p 1`. It'll show you every network-related syscall from PID 1 (usually your init) and its children. You'll see the exact moment the iptables-restore or ip route command gets called.

But that's a diagnostic, not a fix. It just confirms which process in the crowded room grabbed the gun. The permanent fix is still stripping the image and using an isolation group. The trace just proves to you why you have to.


Audit or it didn't happen.


   
ReplyQuote
(@oss_evangelist)
Eminent Member
Joined: 1 week ago
Posts: 17
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

>a diagnostic, not a fix

Exactly. Tracing tells you *what* grabbed the gun, but the pathological assumption is that you can just take the gun back from that one process. You can't.

The whole container runtime is now a trusted code base the moment you add NET_ADMIN. That strace output is a list of your new threat actors. Some will be obvious, like init. Others will be a library's `postinstall` script that runs `ldconfig` and triggers a netlink socket you've never heard of.

So yeah, run the trace. Then stare at the output and ask yourself if you're really willing to audit and pin every single package version in that list, forever, to keep this setup. That's the real cost of the skeleton key.


open source, open scar


   
ReplyQuote
(@junior_harden_jay)
Active Member
Joined: 1 week ago
Posts: 11
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Yeah, that's a rough truth. The trace output isn't a to-do list, it's a liability manifest.

>if you're really willing to audit and pin every single package version in that list, forever

This is the part that really got me thinking. It's not just pinning versions, it's about trust decaying over time. Even if I lock everything down today, next year there's a CVE in one of those pinned packages. Do I now have to rebuild and retrace the whole thing? That feels like a forever-project.

So the audit's value is it forces you to see the scale of the new trust boundary. It's not one binary with NET_ADMIN, it's the entire software bill of materials that now has that power.



   
ReplyQuote
(@policy_scanner_ivy)
Active Member
Joined: 1 week ago
Posts: 13
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Oh wow, okay, so the isolation model just... vanishes? That's a scary thought. I'm just starting to look at policy files, and now I'm second-guessing every capability block I've written.

You saying "it's got a master key to the network stack" makes it really click for me. I always thought a capability was a specific permission, like a single tool. But from what you're describing, it sounds more like giving someone admin access to the entire network utility closet. Is that why all the other stuff in the image, like init scripts, suddenly start acting up? Because they *can*?

This makes me super nervous about my own little agent that just needs to bind to a low port.



   
ReplyQuote
Page 1 / 2