Switching to the Agent SDK isn't just a dev convenience move. It fundamentally changes your security perimeter and your compliance obligations. The raw API was a simple call-and-response. The SDK introduces a runtime with persistent state, tool execution, and a new data flow model. If you didn't update your threat model, you have a gap.
Here's my internal checklist, derived from a recent SOX and GDPR readiness review for a client using the SDK. Focus is on what *changed* from the raw API.
**Data Flow & Residency**
* Tool outputs are sent back to Anthropic's systems by default for reasoning. This is a major shift.
* Confirm whether tool outputs contain regulated data (PHI, PII, financials).
* You are now sharing data from your internal systems (database query results, internal API payloads) with a third-party LLM provider. Document this data sharing in your DPIA/processor agreements.
* Evaluate the `stream` parameter and `System` vs `User` conversation roles for data leakage surface.
* The SDK's hosted components (e.g., the tool use planning logic) see your tool *signatures* (names, descriptions, argument schemas). Ensure these are non-sensitive.
* Local execution is limited. The actual call to your internal tool happens locally, but the instruction to call it, and the result, transit Anthropic.
**Authentication & Tool Permissions**
* The SDK does not handle tool authentication. This is a critical delegation.
* If your tools require API keys, those keys are now embedded in the SDK's runtime environment. Review key storage and lifecycle management.
* The SDK's permission model is binary: a tool is either available or not. Implement your own scoping (e.g., role-based) at the tool implementation layer. Audit trails must log *who* invoked the agent session that led to the tool call.
* Tool argument validation is based on the Pydantic schema you provide. Insufficient validation becomes a direct attack vector against your backend services.
**Audit Logging Requirements**
Your logs must now capture a causal chain, which is more complex than an API call log.
* Session ID linking: Associate the initial user prompt, the agent's planned tool calls, the actual local tool execution, and the final response.
* Tool input/output logging: You must decide what to redact from logs at this stage, balancing debugging needs against data retention policies.
* Prove tool use was *appropriate*: Log the user instruction that necessitated the tool call for forensic reconstruction.
**Supply Chain**
* You've added a new dependency. Track the Agent SDK in your SBOM.
* Monitor for updates: Security patches may affect tool execution flow or data handling.
The bottom line: Using the Agent SDK makes Anthropic a data processor for a wider range of your data. Your tool implementations become a new trust boundary. Your audit trails need more context. Address this before an auditor asks.
Audit or it didn't happen.
Spot on about the tool signatures. I hadn't considered them as an info leak until I saw it in practice. Even a tool named `get_user_by_ssn` is a disclosure.
Your point on the `stream` parameter is crucial. We learned the hard way that streaming `tool_use` blocks can include partial internal data in the chunked responses before you even have a chance to intercept. You need to handle sanitization in the stream, not just the final output.
This is exactly why I started wrapping every tool call in a sanitizer function that strips PII before the result gets sent back for reasoning. Adds latency, but it's non-negotiable for our use case.
That data residency point is the big one that gets overlooked in homelab setups, too. I run my agent on an old NUC in a DMZ, but the moment a tool fetches something from my NAS or Home Assistant, it's heading to their cloud for reasoning.
My workaround was creating "dumb" pass-through tools that only return yes/no or error codes, keeping the sensitive data local. Adds complexity, but keeps my network diagram cleaner. It also forced me to segment my internal services better.
Your checklist reminds me I need to audit my tool descriptions. "fetch_energy_usage" probably gives away too much about my solar/battery setup.