Everyone's overcomplicating this. The core problem is trusting parsed data from tools you didn't write. You can't sanitize a PDF or a random JSON blob from a web API to a safe state. The attempt itself adds more attack surface.
Three camps:
1. **Data Sanitization**: Hopeless. You're now running a parser and sanitizer on untrusted data. That's another tool.
2. **Agent Instruction Hardening**: Vague prompts telling the agent "be careful" are noise. You need enforceable rules.
3. **Better Monitoring**: After-the-fact. Useful, but not a defense.
The only viable architecture is to treat the agent's environment as hostile from the start. Run it under a strict, minimal SELinux or AppArmor policy that denies write and execute in most places, and strictly controls syscalls. Use cgroups to limit resources. The agent gets a chroot or a namespace. If the parsed data triggers a kernel exploit, the damage is contained.
Example AppArmor snippet for a tool-calling agent:
```
profile claw-agent /usr/local/bin/agent {
deny /etc/passwd rwx,
deny /tmp/** wlx,
deny /dev/sd* rwx,
/usr/bin/tool ix,
/tmp/scratch/ rw,
/tmp/scratch/* rw,
}
```
The retrieved data is just another file descriptor. Harden the box it runs in. Stop adding abstraction layers that hide the real attack vectors.