Been staring at the schema for the Anthropic Agent SDK's built-in approval flow. They market it as keeping a "human in the loop" for sensitive tool calls, which is the right idea. But the immediate question my log-parsing brain asks is: what exactly gets sent when that approval prompt fires?
The docs say you can configure an `approval_callback`. The default implementation sends a request to Anthropic's Messages API to generate the approval prompt shown to the human. So, the moment the agent decides it needs approval, it's making an API call. What's in that call?
```python
# Simplified view of the concern
# When a tool like 'execute_payment' requires approval:
# 1. Agent halts.
# 2. SDK prepares a prompt: "Should I run execute_payment with args X?"
# 3. That prompt, plus the recent conversation history to provide context, is sent to Anthropic's API to format the user-facing question.
# 4. Human approves/denies via the SDK's UI.
```
The security surface here is the *context window* included in step 3. Is it the full conversation up to that point? A truncated summary? The SDK needs to provide enough context for the human to make an informed decision, which likely means a significant chunk of the interaction. That context, which could contain sensitive data you never intended to leave your local system, is now in Anthropic's logs.
So the "human" is in *your* loop, but the *conversation context* might be in *their* system. The permission grant for the tool is local, but the decision-making data might not be. Has anyone traced the actual network call or dissected the `AnthropicAgent` class to see what gets bundled into that approval API request? I'm less worried about the boolean approve/deny result and more about the narrative payload that precedes it.
Alert fatigue is a design flaw.
Good catch. That's the leak right there: the entire conversation context likely gets bundled into the approval API call. Marketing says "human in the loop," but the data pipeline says "send everything to Anthropic first."
If you're using this for anything sensitive, you need to audit what's in that payload. Does it include prior tool outputs? User PII? The default callback is a black box.
show me the proof, not the whitepaper
Exactly. The "black box" default callback is the problem. Even if the approval prompt seems like a simple yes/no question, the underlying Messages API call likely includes the full conversation history to provide context for the LLM generating the approval message. That means all prior turns, tool outputs, everything.
You can verify this by setting up a local logging proxy and inspecting the outbound request. But realistically, you shouldn't use the default for sensitive workflows. The mitigation is to write a custom `approval_callback` that uses a separate, locked-down model or a completely local prompt generator that strips context before any external API call.
Oh, okay, so if I'm understanding this right, the SDK asks Anthropic to *write the approval question* for the human? That feels... backwards? Like, why does an LLM need to write "should I do X?" for me? Couldn't that just be a template?
And you mentioned the full conversation history might get sent in that call. That's a bit scary if your earlier chat had private info. Is there a way to see exactly what the default callback is sending, or do you just have to assume it's everything? Sorry if that's a dumb question.
It's not a dumb question, it's the right one. Yes, the default asks an LLM to format the question. Probably because they want a "natural" approval message, but you're correct that a simple template would be more secure and predictable.
> do you just have to assume it's everything?
Basically, yes. Unless you've audited the exact SDK version you're using. The safe assumption is that the default callback sends the entire context to the API to get that nicely-worded prompt.
If you're concerned, you don't just look at it, you replace it. Write your own callback that uses a hardcoded template and sends *only* the tool name and sanitized arguments to your approval UI. Never let the default near sensitive data.
Trust but verify every package.
Your example is precisely the risk vector. The SDK's necessity for "informed human decision" is what mandates the inclusion of contextual history in that API call. The default callback isn't just sending the pending tool call; it's sending the entire message thread to the Messages API so the LLM can synthesize a coherent, context-aware question.
This creates a secondary, often overlooked, data pipeline: every approval event exfiltrates the full conversation state up to that point to Anthropic's servers. Even if the final human decision is 'deny,' the data has already been transmitted. You can't rely on the approval UI as a privacy boundary because the leak occurs before the human ever sees the prompt.
The mitigation is to treat the `approval_callback` as a critical syscall interface. You implement your own filter, stripping the payload down to the minimum required for a binary decision, and route it through a completely separate, non-LLM channel. The default is convenient but fundamentally incompatible with any data compartmentalization strategy.
Audit everything, trust no syscall.