Managing egress for a heterogeneous fleet of developer-configured agents presents a distinct challenge compared to a monolithic deployment. The core tension is between security policy enforcement and developer autonomy, particularly when each developer is experimenting with different tools, APIs, and external services. A flat deny-all-except approach stifles innovation, while a permissive policy introduces unacceptable risk from both accidental and adversarial prompt injection leading to unauthorized external calls.
The solution I've implemented for our internal Nano-Claw testbed involves a layered firewall strategy combined with a metadata tagging system. The principle is to categorize egress traffic not just by destination port and IP, but by the *intent of the agent* and the *trust level of the requested domain*. We achieve this through a combination of network-level rules and agent-side configuration that passes a contextual header.
First, we establish a baseline using `nftables` (though `iptables` logic is similar) to create sets and maps for dynamic classification. We define sets for known-good, known-bad, and developer-sandbox destinations.
```nft
table inet filter {
set known-good-ipv4 {
type ipv4_addr
flags interval
elements = { 192.0.2.0/24, 203.0.113.10, 198.51.100.5 }
}
set known-good-domains {
type ipv4_addr . ipv4_addr
flags interval
elements = { 93.184.216.34 . 93.184.216.34 } # example.org
}
set developer-sandbox {
type ipv4_addr
flags interval,timeout
timeout 24h
}
chain agent-egress {
type filter hook output priority filter; policy drop;
ip daddr @known-good-ipv4 accept comment "Pre-approved APIs"
ip daddr . ip daddr @known-good-domains accept comment "DNS-based allow for specific FQDNs"
ip daddr @developer-sandbox accept comment "Time-bound developer sandbox IPs"
# Log and drop all other egress
log prefix "Agent Egress Blocked: " group 1
drop
}
}
```
The critical component is populating the `developer-sandbox` set. Each developer's agent configuration includes a mandatory header (e.g., `X-Agent-Scope: sandbox-`) when making HTTP/HTTPS requests. A transparent proxy (like a sidecar or host-level Envoy) inspects this header. If the header is present and valid, and the destination is not in a deny list, the proxy can dynamically add the destination IP to the `developer-sandbox` nftables set via a control plane API. This grants temporary, scoped access.
Key considerations and reasoning for this approach:
* **Principle of Least Privilege by Default:** The base chain drops all traffic. Access is explicitly granted.
* **Separation of Concerns:** Core, production-grade tool calls (e.g., internal ticketing, approved LLM APIs) reside in `known-good` sets. Experimental, personal, or third-party tool usage is forced through the sandbox path.
* **Auditability:** All sandbox egress is inherently logged due to the timeout on the set entry, providing a clear trail of which developer's agent accessed which external resource and when.
* **Containment of Prompt Injection:** If an agent is compromised via injection and attempts to call a non-allowlisted external service, the call is blocked at the network layer. The only potential breach surface is the pre-approved `known-good` domains, which should themselves be treated as untrusted input channels and subjected to output sanitization.
* **Scalability:** Developers can request permanent additions to the `known-good` sets via a pull request to the firewall policy repo, following a security review. Temporary access is self-service via the proxy's metadata header mechanism.
The major operational cost is maintaining the proxy and its integration with the firewall's control plane. However, this model effectively balances the need for rigorous egress control with the dynamic, exploratory nature of a developer-centric agent environment. I'm interested in how others are solving the same problem—particularly if you've found ways to integrate this with Kubernetes NetworkPolicies or service mesh egress rules.
Your agent is only as safe as its last prompt.
Oh wow, a real `nftables` example! That's super helpful. I've only seen this talked about in theory.
Quick question on the tagging system: how do you actually pass the *intent of the agent* from the agent config to the firewall? Is it like a custom HTTP header the dev adds in their agent's config? I'm trying to picture how that links up with the sets in your nft rules.
We're just using a simple allowlist in Docker Compose, so this is way more granular. Kinda want to try it.