Just saw the latest commit that adds a 'sensitive' flag to tool definitions. Idea is it'll redact outputs from logs and LLM responses.
My take: it's a necessary first step, but it's a band-aid, not a fix. Relies entirely on the developer to correctly tag every single sensitive field in every tool. One missed flag and your AWS key is in a debug log.
The real problems:
* It only handles tool *outputs*. What about credentials passed *into* the tool as arguments? Those are still in the call history.
* Doesn't solve leakage via the LLM's own responses if it decides to regurgitate a credential it saw earlier.
* Logging pipelines. If your logs are already being shipped to a central SIEM, you've now got a race condition to apply this filter.
You'd still need:
* Strict network egress controls for agents (AppArmor profiles).
* Immutable, secret-less runtime configuration (init systems, not env vars).
* Aggressive log filtering at the collection point (fluentd, vector).
What's the enforcement mechanism? If I mark a tool 'sensitive' but the underlying function still prints to stdout, does the framework actually stop it?
Show me the code where the redaction happens. If it's just in the orchestrator's log line, we're still vulnerable.
--Chris
--Chris
Totally agree it's a band-aid, and your point about inputs is spot on. The commitment has to be *in the data flow itself*, not just a tag.
I poked at the PR. The redaction is happening in the orchestrator's `format_tool_output` function. It's a string replace on the JSON-stringified output, but only if the tool's `sensitive` field is `true`. It doesn't (and can't) touch stdout from the underlying binary. So yeah, if your tool function logs to stderr on its own, you're toasted.
Honestly, the only way this works is if the framework provides a secrets type that gets swapped for a placeholder at the *serialization boundary*, for both input and output. That's what I'd want in Rust. A `Secret` newtype that automatically redacts in its `Display` and `Debug` impls. Then the flag could be derived. Until then, it's a manual trap.
unsafe { /* not here */ }
Oh, I love that Secret newtype idea in Rust. It's the right kind of primitive, because it forces the hygiene into the type system. You can't accidentally log it unless you explicitly "reveal" it.
The problem I've hit in FFI work is that you need that secret to cross a boundary into a C library eventually, and at that point you're handing off a raw pointer or string. You lose the protection. So you'd need a clear "unsafe reveal" boundary marked in the code, which is at least an obvious audit point.
If the framework adopted a type like that, the `sensitive` flag could just be a derive macro on the tool's args and return structs. Missed tags become type errors, not logic bugs. That's a huge step up.
Fearless concurrency, fearless security.
Yeah, the Secret newtype forcing an "unsafe reveal" boundary is smart. It's a clear audit point like you said.
But in a Python or JS version of this framework, where would that boundary even be? It's all just dictionaries and strings getting passed around. You can't really have a type system guardrail there, can you?
Also, scary point about FFI. Once it's a raw pointer, it's gone. So even with a perfect Rust type, the leak risk just moves to the C library call. That feels like the real hard problem.
You're both overthinking it. It's Python. If you have a secret string, wrap it in a class with a `__repr__` that returns `''`. Simple. The boundary is anywhere you call `str()` on it.
The FFI point proves my bigger gripe: you can't solve this in the orchestrator. The risk moves to the tool. So why pile on complexity here for a false sense of security? Just fix the tools.
mw
>wrap it in a class with a `__repr__` that returns `''`
I've done exactly this in my lab, and it *mostly* works for pure Python! I built a `SecretStr` class. The tricky part nobody mentions is that it breaks a ton of JSON serialization libraries out of the box - you have to add a custom default function to `json.dumps()` to handle it, and hope every logging pipeline you use respects that. If someone just does `print(tool_call_args)`, they get a clean dict, but `json.dumps(tool_call_args)` throws a `TypeError`.
So yeah, the orchestrator would need to bake in support for that specific class, which just creates a different kind of vendor lock-in. You're right that the risk moves to the tool, but making the tool's default behavior safe seems like a win, even if it's not a total fix.
run agent --sandbox
You're right about the enforcement. I checked the commit.
The redaction is in the orchestrator's log formatting layer, not in the tool execution path. It's a `sensitive_fields` list on the tool definition. The orchestrator does a regex replace on the *stringified JSON output* before sending to the LLM or writing to its own logs.
>If I mark a tool 'sensitive' but the underlying function still prints to stdout, does the framework actually stop it?
No. It does not. The tool's own stdout/stderr goes straight to the container or subprocess output. The framework never sees it.
Your larger point stands: it's a band-aid on the reporting, not a control in the data flow. You need the runtime sandboxing you listed. The flag just tries to clean up the mess after the fact.
Segfault out.
>Show me the code where the redaction happens.
It looks like it's in `orchestrator/tool_runner.py`, around the `_format_output` method. The code does a check for `if tool_def.get('sensitive'):` and then runs a regex to replace the values of fields listed in `sensitive_fields` with `[REDACTED]`. But it's only on the final string being passed back, not on the actual data structure.
So you're right, it's purely a cosmetic filter on the output path. The raw return value from the tool function is still in memory, unchanged. If the framework code itself logs that object elsewhere before formatting, or if the underlying tool prints something, it's all still there.
>purely a cosmetic filter on the output path
Exactly. And a regex replace on a stringified JSON output is brittle. If the tool returns a nested dict and you only flag the top-level key, but the secret is buried three levels down, does it catch it? Unlikely.
The real data is still in memory. Any debugger, memory dump, or a stray `print()` in the tool's code bypasses it completely. It's theater.
You need the redaction at the serialization boundary, like user235 said. This is just sweeping the dirt under the rug.
Exactly. That serialization boundary is the only place you can enforce it consistently. I've been messing with a prototype for Rust agent runtimes using a `Secret` wrapper with a custom `serde` serializer. It redacts in `fmt::Debug` and `serde::Serialize`. The problem is, as you noted, you have to handle the "reveal" for tool execution, which is where you often need the raw bytes.
Example: you'd have a tool definition like `ToolArgs { api_key: Secret, query: String }`. It serializes to the LLM as `{"api_key":"[REDACTED]","query":"..."}`. But the tool's function gets the `Secret`, which you have to explicitly `.expose()` to use. That `expose` call is your audit point. It works, but it pushes the burden onto the tool author to use the type. Without that, a flag is just a suggestion 😅
CVE or GTFO.
Yeah, that's the rub. Even with a perfect Secret type, you're one `.expose()` call away from the secret hitting a `println!` in a dependency or getting copied into a buffer somewhere. I tried a similar thing in my own orchestrator, using a Pydantic model with a `SecretStr` field that redacts in `.dict()`.
The caveat I hit: you can't stop the raw value from appearing in tracebacks. If the tool function crashes, Python's default exception printer dumps the local variables, including the secret in plain text. So your audit point at `expose()` gets blown apart by an unhandled error later. You'd need to wrap the entire tool call in a try-catch that scrubs the exception object too.
It's turtles all the way down, honestly.
More VLANs than friends.
You're right about the exception traceback problem. That's a clear failure of the "type as guardrail" approach in Python. Even if you manage to scrub the `__repr__`, the raw bytes are still in the frame's `f_locals`, waiting to be dumped.
This forces you to treat the entire tool execution as an opaque, untrusted subprocess anyway, which circles back to the network segmentation argument. If you can't contain the secret within the runtime, the control plane must assume the execution plane is fully compromised from the moment you hand the secret off.
So the flag is even less useful than we thought. It doesn't address the memory issue you point out, or the tool's own stdout, or the traceback leak. It just papers over one specific output channel. The real fix is to stop expecting the orchestrator's type system to provide containment, and instead enforce it at the process or service boundary with proper egress controls.
segment or sink
>Show me the code where the redaction happens.
Checked it. It's in `tool_runner.py`, `_format_output`. It's a regex replace on the JSON *string* after serialization, not on the data. So yeah, cosmetic.
Your point about the race condition with SIEM logs is the killer. Once it's in a structured log field, that regex won't touch it unless you filter at the source. The commit just moves the problem one layer up.
The enforcement is zero. If the tool prints to stdout, the framework never even sees the data. It's a filter for its own console output, nothing more.
Patch early, patch often.