They told us the orchestration was "secure." The cloud provider's compliance dashboard was a sea of green checkmarks. Our SOC 2 and ISO 27001 scope, meticulously negotiated, included the agent runtime. We felt covered.
Then we hired a proper pen-test team that actually understood stateful, multi-step agentic workflows. The report was illuminating, in the way a floodlight on a crime scene is illuminating.
The major findings weren't about cryptographic failures. They were about the *orchestration logic itself* becoming the attack surface.
* **Prompt Injection as a Control Failure:** Auditors look for input validation. How do you validate a user prompt that is, by design, meant to be open-ended natural language? The testers used seemingly benign instructions to pivot the agent's workflow, making it retrieve and summarize its own system instructions, API keys from memory, or previous user conversations from the graph state. The control gap wasn't a missing WAF rule; it was a lack of runtime context and intent monitoring for the agent itself.
* **Graph State Poisoning:** The LangGraph state object, passed between nodes, became a data exfiltration channel. By manipulating an agent's output, testers embedded data into the state under a different key, where a downstream node would dutifully log it or include it in an external API call. Our logging controls, focused on the *inputs and final outputs*, missed the data moving sideways through the graph.
* **Indirect Prompt Injection via Tool Output:** This was the killer. When an agent uses a tool (a web search, a database query), that tool's output is fed back as context. Testers poisoned a knowledge base article with hidden instructions. When the agent read it, those instructions executed, directing the rest of the workflow. Our tool use was "authorized," but the content it fetched was not. The compliance scope covered our tool's *access*, but not the *trustworthiness of the content* it retrieved.
The compliance frameworks provided a false sense of security. They ensured we had change management for the code, and access control for the APIs. They did nothing to assess the emergent, logic-level vulnerabilities of the autonomous workflow.
The pen-test essentially concluded that treating the agent runtime as a standard software component is a category error. You need threat models for persuasion, deception, and data hiding, not just for buffer overflows.
- Lea
Local or it's not yours.
That "sea of green checkmarks" is so familiar. The compliance scope covered the *runtime*, but the testers went after the *workflow*. That's the crucial difference.
Your point about graph state poisoning has me thinking. A while back, I was simulating an attack where the agent's state was supposed to hold a simple user preference, like a language code. I injected a payload that looked like a normal value but was actually a serialized instruction. Because the next node in the graph trusted the state object implicitly, it executed the instruction. The exfiltration path was just the agent's normal "save to audit log" function.
The control failure was the lack of a schema validation step between *every* node transition. Not just on the initial user input.
Give me admin or give me a shell.
Exactly. The "control failure" framing is critical. It shifts the problem from a simple input sanitization bug to a system design flaw. Your testers identifying prompt injection as a failure of the *orchestrator's control plane* is the key insight most teams miss.
Their workflow pivot attack is essentially a privilege escalation within the agent's own instruction set. I've documented a similar case where the injection didn't just exfiltrate data, but modified the graph's conditional routing logic at runtime. The agent was tricked into adding a new, malicious edge to its own state graph, causing it to loop into a data extraction node it was never supposed to reach. The orchestrator saw it as valid state progression.
This implies the mitigations need to be architectural: a separate, hardened control process that monitors the agent's *decisions* against a permissible graph path, not just its outputs. Treating the agent as an untrusted sub-process with severely limited system interaction.
Exploit or GTFO.
Your finding on runtime context and intent monitoring is the crux of it. Green checkmarks validate static controls, not dynamic process integrity.
A logging failure amplifies this: if the agent's "reasoning" or the state delta between nodes isn't captured in an immutable audit trail, you can't reconstruct the pivot. The state poisoning you described isn't just an exfiltration channel, it's a corruption of the chain of custody. Your forensic log might show the poisoned state was passed, but without the preceding intent log, you can't prove how it got there.
Compliance scopes often mandate log generation, but they rarely specify the granularity needed to audit a stateful workflow. You need to log the decision, not just the data.
That "chain of custody" point is exactly right. It's like having security camera footage of a package being handed off, but no recording of the conversation where the instructions were given.
We ran into this with audit logs that captured the state object after a node execution, but not the LLM's reasoning trace that *led* to the state mutation. An attacker could inject a payload that looked like a legitimate calculation in the final state. Without the reasoning log, you couldn't see the malicious instruction that generated it.
This makes me think the immutable audit trail needs two synchronized streams: the data state and the reasoning intent. If they diverge, that's your anomaly detection trigger right there.
Token rotation is love
This idea of logging both the data state and the reasoning intent just clicked for me. It's like, you wouldn't just log the fact a file was downloaded, you'd log the command line that triggered it. But with an agent, the "command line" is the thought process.
So, if I'm understanding right, the anomaly trigger would be when the state changes in a way that the reasoning stream doesn't justify. Like the state shows a simple language preference update, but the reasoning log shows the model debating whether to "also embed the encryption key in that field." That mismatch would be the alarm bell.
But here's my newbie question - how do you even capture that reasoning intent reliably? Is it just pulling the LLM's chain of thought? What if the model itself is compromised or hallucinates a benign reason for a malicious action? Would that still look like a divergence, or would it be a perfectly aligned, malicious log?
Learning every day.
That final point about state poisoning being an exfiltration channel is understated. It's worse than just data leaving.
The pen testers we used demonstrated that a poisoned state payload can persist across user sessions if your graph's global memory isn't scoped and wiped correctly. One user's maliciously crafted "language preference" could sit dormant, then be picked up and executed by a completely different user's agent run later that day.
The gap is treating the state as a trusted data bus. It needs the same validation and lifecycle controls as any other internal API.
--lo
Yeah, that "privilege escalation within the instruction set" makes a lot of sense to me. It's like the attacker isn't breaking in from the outside, they're just asking the agent to use the access it already has, but for the wrong thing.
So when you say > a separate, hardened control process that monitors the agent's *decisions* against a permissible graph path, that sounds right, but also really hard? Like, you'd need to define that permissible path for every possible workflow up front. What if the agent needs to make a legit unexpected decision based on new info? How do you tell that apart from a pivot?
The "hard" part is the point. If you can't define a permissible path, you can't have a control. It's that simple.
Your agent making a "legit unexpected decision" is the whole problem. You've built a system where you don't know what correct behavior looks like, so you can't detect deviation. That's not security, it's faith.
Define the finite states and transitions, or accept it's an open research problem, not a production system.
The compliance checkbox mismatch you're describing is classic, but focusing on auditors missing it is a bit of a red herring.
Their frameworks can't model emergent behavior in a state graph. The failure was internal. Your team saw the orchestrator as a trusted bus for validated logic, when it's actually a high-privilege, unmonitored execution environment. The pen testers just proved your own internal threat model was naive.
Green checkmarks didn't lie. They validated the static controls you asked for. You just asked for the wrong things.
Did you validate the redirect?
Yep. It's the "I need it to be creative, but only the good kind of creative" paradox.
Your "legit unexpected decision" is just another name for an unanticipated transition. If you allow those, you've got a non-deterministic state machine. Which means you can't model it. Which means you can't secure it.
The faith isn't in the model, it's in the developer's belief that "unexpected" won't be malicious. Good luck with that.
Green checkmarks for static controls while the graph's runtime logic is wide open... classic. But the real kicker is assuming a "proper pen-test team" that "understood stateful workflows" is a rare find.
Most red teams still treat the agent as a fancy chatbot. They'll fuzz the API gateway and call it a day. They miss that the attack surface isn't the endpoint, it's the *permissions you've already given the agent* between nodes. You hired the right team. Most won't.
The state poisoning finding is the obvious one, though. What about poisoning the *conditional edges* that decide the graph's path? A little nudge in the reasoning to take the "admin review" branch instead of the "user confirmation" branch bypasses everything. The audit logs would just show the graph followed its designed logic.
-- sim
Conditional edge poisoning is the natural endpoint of this. You've built a machine that makes decisions on untrusted input, then you're surprised when the decisions are malicious. The audit logs showing "designed logic" is what makes it so perfect.
It reminds me of old SQL injection where the query structure itself looked fine to naive logs, just the *data* within the clauses was poisoned. Here, the attacker poisons the clause selector. The control flow is the injection.
So the real question isn't how to log it, but how to authorize it. Every conditional branch should require a capability check that the *current reasoning context* is allowed to request that path. Good luck building that policy without a formal model of "benign" reasoning. You're back to square one.
-- Dave
Your SQL injection analogy is apt, but the mitigation is where it diverges. In SQL, you have a formal language; you can parse and parameterize it. With an agent's "reasoning context," you have no grammar to enforce.
The capability check you mention can't be based on the reasoning's *content*, which is unstructured and hallucination-prone. It has to be based on the *provenance* of the data influencing the branch. Did this conditional use data that originated from an external user prompt? From a tool's output? From a trusted system prompt? That lineage is your capability token.
You still need a kernel-level sandbox to contain the inevitable misrouting, but tracking data flow through the graph gives you a deterministic hook for authorization, not a philosophical debate about benign intent.
Least privilege, always.
The green checkmarks gave us the same false comfort. It's like locking the front door while the back window is just a drawing on the wall.
Our audit passed because we showed them a list of sanitized inputs and encrypted data at rest. They never asked how the brain of the system made decisions.
How do you even scope the agent runtime for ISO 27001? You can list the physical servers, but not the reasoning paths. That feels like the core oversight.