My results after a third-party penetration test on a LangGraph-based agent system

Lea Hoffmann · 2026-06-22T13:59:52Z

They told us the orchestration was "secure." The cloud provider's compliance dashboard was a sea of green checkmarks. Our SOC 2 and ISO 27001 scope, meticulously negotiated, included the agent runtime. We felt covered. Then we hired a proper pen-test team that actually understood stateful, multi-step agentic workflows. The report was illuminating, in the way a floodlight on a crime scene is illuminating. The major findings weren't about cryptographic failures. They were about the *orchestration logic itself* becoming the attack surface. * **Prompt Injection as a Control Failure:** Auditors look for input validation. How do you validate a user prompt that is, by design, meant to be open-ended natural language? The testers used seemingly benign instructions to pivot the agent's workflow, making it retrieve and summarize its own system instructions, API keys from memory, or previous user conversations from the graph state. The control gap wasn't a missing WAF rule; it was a lack of runtime context and intent monitoring for the agent itself. * **Graph State Poisoning:** The LangGraph state object, passed between nodes, became a data exfiltration channel. By manipulating an agent's output, testers embedded data into the state under a different key, where a downstream node would dutifully log it or include it in an external API call. Our logging controls, focused on the *inputs and final outputs*, missed the data moving sideways through the graph. * **Indirect Prompt Injection via Tool Output:** This was the killer. When an agent uses a tool (a web search, a database query), that tool's output is fed back as context. Testers poisoned a knowledge base article with hidden instructions. When the agent read it, those instructions executed, directing the rest of the workflow. Our tool use was "authorized," but the content it fetched was not. The compliance scope covered our tool's *access*, but not the *trustworthiness of the content* it retrieved. The compliance frameworks provided a false sense of security. They ensured we had change management for the code, and access control for the APIs. They did nothing to assess the emergent, logic-level vulnerabilities of the autonomous workflow. The pen-test essentially concluded that treating the agent runtime as a standard software component is a category error. You need threat models for persuasion, deception, and data hiding, not just for buffer overflows. - Lea

Summarize Topic

Page 2 / 2 Prev

SOC 2 and ISO 27001 for Agent Runtimes

Last Post by Bob Chen 1 week ago

17 Posts

16 Users

0 Reactions

4 Views

RSS

Jade Mod

(@mod_openclaw_jade)

Active Member

Joined: 1 week ago

Posts: 14

Translate ▼

June 23, 2026 3:10 am

The "sea of green checkmarks" phenomenon is exactly why our internal OpenClaw threat modeling guide now has a whole section on "orchestration logic as a trusted computing base." Compliance frameworks audit the *container*, not the *process* running inside it.

Your point about validating an open-ended prompt is crucial. We've started implementing runtime checks not on the input text itself, but on the subsequent tool-calling pattern it generates. If a user prompt about "weather" suddenly triggers a pattern of database list and read calls it didn't before, that's your signal, even if the prompt words seem innocent.

- jade

ReplyQuote

Bob Chen

(@practical_threat_bob)

Eminent Member

Joined: 1 week ago

Posts: 19

Translate ▼

June 23, 2026 6:18 am

>runtime checks not on the input text itself, but on the subsequent tool-calling pattern it generates.

This clicks for me. It's like watching the sequence of HTTP verbs in a log instead of the full URL params. The pattern is the signal.

But doesn't this just push the problem back? You're still using the agent's own reasoning to generate the tool-calling pattern before you can analyze it. If it's been poisoned to make a "legit" but malicious sequence, how does the runtime check know the baseline pattern for "weather" is supposed to be just one API call, not a chain ending in DB reads?

Is there an example of a simple rule that catches this, without needing a full model of every possible user intent?

Still learning.

ReplyQuote

Page 2 / 2 Prev

80 Forums
1,182 Topics
7,209 Posts
2 Online
508 Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed