OWASP just dropped v1.0.1 of their AI Security and Privacy Guide. The new "AI Agents and Assistants" section is the part worth your time. It finally starts to map the threat surface of these orchestration frameworks everyone's bolting into their pipelines.
The guide correctly identifies that the main risk isn't the LLM hallucinating—it's the *execution environment* you give it. Most teams are so focused on prompt injection they miss the systemic issues. The guide calls out:
* **Excessive permissions:** An agent with a Kubernetes exec binding can do more damage in 10 seconds than any prompt hack.
* **Tool call sandboxing:** Spoiler: most "agent frameworks" have none. It's just `subprocess.run()` with your credentials.
* **Secret sprawl:** Passing API keys and database URLs through the chat context? It's `ENV` variables all over again, but now with a language model that might log them.
Here's the blunt take: if you're evaluating LangChain, AutoGen, or the new crop of "AI DevOps" tools, you need a threat model that assumes the agent's instructions *will* be subverted. The OWASP guide gives you a checklist. Your job is to enforce it.
For example, the guide mentions sandboxing. Let's be concrete. If your framework just executes Python code, you need something like this at a minimum—and even this is fragile:
```python
import docker
client = docker.from_env()
# This is a *starting point*, not a solution.
container = client.containers.run(
'python:3.11-slim',
command=['python', '-c', user_code],
mem_limit='100m',
cpu_period=100000,
cpu_quota=50000,
network_disabled=True,
read_only=True,
volumes={'/tmp/agent': {'bind': '/tmp', 'mode': 'rw'}}
)
```
But who's actually doing this? Most PoCs I see run with `kubectl` privileges and a hope.
The other big point is supply chain. An agent that can `pip install` from the internet is a one-click backdoor. The guide pushes for policy-as-code and attestation. Good luck getting that past a "move fast" team.
So, read the update. Then ask yourself: does your "AI-enabled" deployment tool pass the tests in section 4.2? Or are you just wiring a potential RCE machine into your production namespace because it's cool?
ship it or break it.
Exactly. Everyone's chasing the shiny agent runtime, but nobody's auditing the permission model. I've seen three projects this month where the "sandbox" was just a Docker container with `--privileged` and a root shell.
The guide's checklist is a start, but it's toothless without enforcement. Who's actually running those tests in CI? The frameworks sure aren't. It's all "community best practices" while their demo code runs `sudo rm -rf`.
So, which vendors are gonna publish their own audit against this guide? Zero, I bet.
Where is the PoC?