Hit a weird one testing our OpenClaw agent. We've got super strict egress rules on the runtime host—only outbound to a single allowed API endpoint. But the agent is still making calls to random IPs. Wireshark doesn't lie.
* Egress is locked down via `iptables` and a network policy (k8s).
* Agent config has the correct, single `base_url`.
* Yet, we're seeing TCP SYN packets to AWS IP ranges we've never heard of.
Suspected an SSRF-like scenario where the agent itself is being tricked into making the call. Did some basic fuzzing on the task input:
```json
{
"task": "Fetch the data from https://internal.corp.local/admi n",
"context": "{{ some user input }}"
}
```
Auditors are gonna have a field day with this. "CC-6.1? More like CC-6.1-oh-no."
Anyone else scoping agents into their compliance frameworks and seeing this? What controls are you mapping for "agent decides to go rogue and phone home"? Is this just a "deny all, allow explicit" network control failure, or something new?
if it moves, fuzz it
You've almost certainly found a prompt injection leading to SSRF within the agent's reasoning loop. The network controls are working - they're preventing a direct call from the runtime. But the agent, when interpreting a maliciously crafted `task`, is likely instructing an allowed tool, like a `requests` call, to use a user-provided URL instead of the intended `base_url`.
This isn't a network control failure; it's an adversarial robustness failure in the agent's instruction parsing. The agent's "intent" is being hijacked. Your fuzzing approach is correct. I've seen similar cases where the agent concatenates user input into a curl command string that gets executed.
The compliance mapping is tricky. You'd need to show that the agent's decision process is part of your trusted computing base, which current frameworks aren't built for. It falls between "data input validation" and "process integrity." What's your agent's tool calling validation look like? Are you constraining the `parameters` for the fetch tool to a strict allowlist, or is it passing a raw, interpreted string?
theory meets practice
Yep, classic prompt injection leading to tool parameter substitution. Your network controls are fine; the agent's instruction parsing is the weak link. It's not "rogue," it's just doing exactly what a hijacked plan tells it to do.
> What controls are you mapping for "agent decides to go rogue"
You have to treat the agent's reasoning as part of your TCB. That means:
* Input sandboxing for tool parameters (allow-lists for URL schemas/hosts, not just base_url).
* Validation *after* the agent's planning step but *before* tool execution.
* Mapping to compliance controls? Look at the logic that validates tool calls. That's your new CC-6.1 boundary.
We built a pre-execution hook that regex-scrapes the tool call JSON for URLs and validates against a policy. It's ugly but it works.
break things, fix them
That pre-execution hook idea is really smart. It makes sense to catch the bad call *after* the agent thinks of it but *before* it runs.
> validation *after* the agent's planning step but *before* tool execution
For someone just starting with agent security, where's the best place to plug that in? Is it in the agent framework itself, or do you intercept at the tool calling layer? Trying to picture the plumbing.