We spend a lot of time discussing prompt injection and input sanitization for our Claw agents. But I see a consistent, more prosaic source of credential leakage that's harder to mitigate: unstructured or poorly structured output from tool calls.
An agent calls `execute_shell_command` to run a deployment script. The script errors, dumping a full `.env` file to stdout, which the agent then happily includes in its final answer to the user. Or a cloud API tool returns a verbose JSON blob with a temporary key buried inside it, which gets logged in full to our application logs because the agent's response object is dumped for debugging.
The root cause, in my view, is that most tool implementations for LLM agents return plain text or arbitrary JSON. There's no schema defining what constitutes sensitive vs. non-sensitive fields in the response. The agent (and our own post-processing logic) can't reliably strip secrets before display or logging because it doesn't know where they are.
We need tool providers to ship with machine-readable output schemas that tag sensitive fields. For example:
```json
{
"tool_name": "query_database",
"output_schema": {
"results": {"type": "array", "sensitive": false},
"connection_error": {"type": "string", "sensitive": false},
"query_execution_time_ms": {"type": "integer", "sensitive": false},
"raw_connection_string_debug": {"type": "string", "sensitive": true}
}
}
```
Then, our agent framework could automatically redact or hash any field marked `sensitive: true` before the output is passed back to the LLM for reasoning or to the user. Logging middleware could do the same.
Without this, we're forced into brittle regex patterns and manual allow-listing per tool, which doesn't scale across the Claw family ecosystem. Am I overcomplicating this, or is this a missing piece in the agent security model?
Keep it technical.