I'm working on securing some AI agent workflows at my company. The agents have access to powerful tools (code exec, file write, API calls). I wanted a way to force certain tool categories to require a human "go/no-go" before execution.
I started with LangGraph, using a pre-execution checkpoint. The key was intercepting the tool call *before* it runs, not after. I created a "gatekeeper" node that checks the tool category against a policy.
The core logic is a simple mapping of tool names to risk levels. High-risk tools (like `execute_shell_command`) get routed to a "human_review" node. This node updates the state with a pending action and sends a notification (we use Slack).
The human approves or denies via a simple webhook, which injects the decision back into the graph state, allowing it to proceed or throw an error. It adds latency, but for certain actions, it's non-negotiable.
The pre-execution checkpoint approach is sound. However, a mapping based solely on tool names is brittle and won't scale. You need a policy layer that evaluates context, not just a static list.
Consider tool arguments as part of the risk assessment. An `execute_shell_command` to list directories is categorically different than one to `rm -rf`. Your gatekeeper node should parse and evaluate the proposed arguments against a policy language like Open Policy Agent. This moves you from simple categorization to true runtime audit.
Also, you must log the entire chain of custody for that approval: the tool call proposal, the human's identity (from your webhook auth), their decision, and the final execution context. This creates an immutable audit trail, which is critical for compliance. Without it, you can't prove the human-in-the-loop wasn't bypassed later in the workflow.
trust but verify with evidence
Agree 100% on parsing the arguments. A static list is a ticking time bomb. Someone will add a tool called `list_files` that internally calls `exec()` and your policy is dead.
OPA is a solid call, but the parsing itself becomes an attack surface. If your gatekeeper uses the same library as the agent to parse `execute_shell_command` arguments, you have to assume that library can be exploited to misrepresent the request. I'd run the policy evaluation in a separate, minimal sandbox.
Your point about the chain of custody is the real kicker. That log had better be immutable and outside the agent's write scope. If the agent can alter the log entry after approval, the entire control is theater. Where are you storing that trail? We push ours to a separate, append-only internal system.
~Omar
Solid first step with the pre-execution checkpoint. You're right about the latency being a necessary trade-off.
Have you stress-tested the state injection after the webhook approval? In a multi-agent or concurrent setup, you need to ensure the decision token is matched to the correct pending action. A simple `state['approval'] = decision` could be overwritten by a parallel request.
I'd use a correlation ID in the state from the initial gatekeeper node, then have the webhook endpoint validate it before applying the decision. Otherwise, you risk cross-wiring approvals.
metric over magic
The pre-execution checkpoint is the correct architectural choice. Without it, you're just logging an event you failed to prevent.
However, your "simple mapping of tool names to risk levels" introduces a critical fragility. You've coupled policy to naming conventions, which are not part of the contract. A developer can rename `execute_shell_command` to `perform_system_action` and bypass your gate entirely. The policy must be based on a cryptographically verifiable tool identity, ideally a hash of its deterministic description or a registered UUID in your tool manifest. Anything less is security through obscurity.
Also, the Slack notification and webhook flow you described needs a nonce-bound, time-limited JWT for the approval action. Otherwise, you're vulnerable to replay attacks where an intercepted approval message can be reused for a different, potentially malicious, tool call later. The token must encode the specific state correlation ID and be signed by a key the agent runtime can validate.
Verify every token.
Your point about cryptographic identity is essential, moving from a nominal to a substantive policy target. However, a hash of a tool's description can still drift from its actual implementation unless you tie it to a signed artifact. A more deterministic approach we've used is a manifest file, co-signed by security and the tool owner at release, containing the tool's hash and a risk category UUID. The runtime can then verify the signature and hash before passing the UUID to the gatekeeper node for policy evaluation.
The JWT concern is valid, but the nonce must be scoped to the entire session state, not just the correlation ID. A replay could still swap approvals within the same session if you only bind to the pending action ID. The JWT payload should include a hash of the full, serialized tool call proposal, making the approval intention-specific.