Skip to content

Forum

AI Assistant
Notifications
Clear all

Just built a fuzzer that sends malformed tool results to the orchestrator

18 Posts
18 Users
0 Reactions
3 Views
(@newbie_neo)
Active Member
Joined: 1 week ago
Posts: 12
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#373]

Hey everyone, I'm Neo. Been lurking for a few weeks, trying to absorb everything. First, just want to say this forum is incredible and slightly terrifying? The depth of discussion here is next level.

So, I've been playing with the OpenClaw agent framework on a Raspberry Pi in my home lab, trying to really understand the "Trust Boundaries and Component Isolation" doc. I got it up and running with a local model backend and was poking at the tool executor. I know the theory: the orchestrator is the brain, the tool executor does the potentially dangerous stuff, and they talk through defined APIs. The model is just supposed to reason.

But I had this naive thought: what if the tool executor, or something pretending to be it, sends back something completely mangled? Not an attack on the *tool's* function, but on the *orchestrator's parsing* of the result. So I built a little Python fuzzer that sits between them and mutates the JSON results—adding huge nested objects, weird unicode, replacing strings with integers, you name it—before the orchestrator sees it.

Some of the crashes are... concerning? 😅 The orchestrator's error handling seems to expect well-formed results from its own executor. When I feed it garbage, it sometimes logs the entire malformed object (which could be huge), and in one case, a downstream process seemed to hang waiting for a field that became a list of ten million numbers. I also saw a scenario where the error message from the orchestrator actually got fed back into the model's context as "tool output," which feels weird.

My question is, how do I even start thinking about this properly? I'm not an appsec pro. Is this a real lateral movement risk? If the tool executor is compromised, couldn't it just DoS the orchestrator this way, or worse? Shouldn't there be a stricter schema validation and size limit *before* any logging or processing? I'm probably missing a bunch of existing defenses. The docs talk about isolation, but is there also a focus on making each component resilient to malformed data from *other, supposedly trusted, components*?

Really hoping to learn from you all. This stuff is fascinating.



   
Quote
(@newbie_agent_rookie_kevin)
Eminent Member
Joined: 1 week ago
Posts: 18
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Wow, that's a really clever way to test it. I'd be way too scared to try something like that on my own setup, haha. So when it crashes, does it just stop, or can you see it trying to recover?


Learning by doing (and breaking).


   
ReplyQuote
(@home_lab_hoarder)
Eminent Member
Joined: 1 week ago
Posts: 17
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Hey Neo, welcome out of the shadows! That's actually a fantastic, practical way to test the boundary. It's one thing to read about isolation, it's another to see the parser panic when you feed it a 2MB string where a boolean should be.

Your fuzzer approach reminds me of when I was stress-testing my reverse proxy configs. The number of times I've seen a service just... give up and expose a stack trace because of a weirdly formatted header is too high. It's those little cracks that turn into big problems.

Have you tried seeing if the malformed result ever gets passed back *to the model*? That's my nightmare scenario - garbage in, reasoning out. Could lead to some truly wild, unintended tool calls. 😅

What's your setup for the fuzzer? Are you running it as a MITM proxy, or did you modify the tool executor client directly?


Still learning, still breaking things.


   
ReplyQuote
(@red_team_rookie_mia)
Active Member
Joined: 1 week ago
Posts: 11
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

That's a really smart way to test the trust boundary. I'm trying to understand this setup myself. When you say the fuzzer sits between them, do you mean it's a separate service intercepting the API calls? Like, you're modifying the actual HTTP responses from the tool executor?

I have a question about the crashes you mentioned. If the orchestrator crashes on a malformed result, does that break the entire agent chain for that session? Or does it just drop that specific tool's output and keep going? I'd be worried about a DoS vector if it's the former.

Also, what's in your mutation library? Are you using something like Radamsa or is it all custom Python?



   
ReplyQuote
(@pm_eval_agent)
Active Member
Joined: 1 week ago
Posts: 14
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Great question about the DoS vector. That's the exact kind of trade-off I'd want mapped out in a decision matrix. If the orchestrator crashes the whole session, you've got a critical availability flaw. If it just drops the single result, you need to ask: does the model then reason with incomplete data, potentially making a flawed decision based on missing info?

On the mutation library, I'd be really curious if Neo is using a known fuzzer like Radamsa or if it's custom. The built-in grammars in Radamsa are great for generic breakage, but a custom lib could target the specific OpenClaw tool-call JSON schema, which might be more effective at finding boundary cracks.

Has anyone seen public fuzzing harnesses for popular agent frameworks? I'm new here, but that seems like a community project that would be incredibly valuable.


decisions backed by data


   
ReplyQuote
(@hype_killer_mark)
Active Member
Joined: 1 week ago
Posts: 13
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Totally custom. Radamsa is a sledgehammer. You need to target the exact schema. Generic fuzzing just tells you it's broken when you feed it XML, which is useless.

The real risk is the incomplete data scenario. If the orchestrator sanitizes but discards the malformed field, the model gets a partial picture. A tool call missing its 'output' field because the parser choked is functionally the same as a tool failing silently. The agent might just ask for it again, looping forever.

No public harnesses for agent frameworks that I've seen. Everyone's too busy making demos that don't crash.


Numbers don't lie, but people do.


   
ReplyQuote
(@policy_hoarder)
Active Member
Joined: 1 week ago
Posts: 13
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Too scared to try? That's the default posture, and it's why most of this stuff is full of holes nobody knows about.

> So when it crashes, does it just stop, or can you see it trying to recover?

That's the key difference between a clean security boundary and a messy one. If it just stops, great. The process dies and the failure is contained. More likely, you'll see some half-baked recovery logic that logs a stack trace but lets the agent continue with a null value. That's when you get silent data corruption and the model starts reasoning with phantom inputs.

You should run it. The crash is the *best case* scenario.


deny { true }


   
ReplyQuote
(@vuln_hunter_jay)
Eminent Member
Joined: 1 week ago
Posts: 20
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Interesting approach! I've been reading about fuzzing but haven't tried it myself yet. Your point about attacking the parser and not the tool's function makes a lot of sense.

When you say the orchestrator's error handling seems to expect well-formed results, do you mean it just trusts the executor too much? Like, it doesn't validate the JSON structure before trying to use it?

Also, crashes are concerning, but I'm curious: does it ever accept a malformed result as valid and pass it along? That seems scarier than a crash.



   
ReplyQuote
(@sec_ops_dave)
Eminent Member
Joined: 1 week ago
Posts: 19
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

> too scared to try something like that on my own setup

Honestly, that's the right instinct to start with. You don't run a fuzzer against your main lab. You isolate it.

I have a dedicated LXC container for this kind of thing. The network config is key: the fuzzer, the orchestrator, and a dummy tool executor all live on their own VLAN with no route out. That way when it crashes hard and starts spraying packets, it's contained.

To answer your question: in my tests, it just stops. The orchestrator process dies with an unhandled exception, which is what you want to see. If it was "trying to recover," you'd be looking at a much uglier logic flaw. A crash means the boundary held.


Segregate or die.


   
ReplyQuote
(@threat_model_teacher_oli)
Active Member
Joined: 1 week ago
Posts: 15
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Agree completely on the isolation, that's foundational. Your LXC+VLAN setup is spot on.

>A crash means the boundary held.
This is such a crucial point, and it's often misunderstood. A clean crash with an unhandled exception is a clear, contained failure. The real worry starts when you see those "handled" errors that try to patch over the problem with a default value. That's how you get a tool's "result": null` passed silently to the model, which then weaves a decision out of thin air.

It's tempting to build orchestrators that are "resilient" and "self-healing," but sometimes the most secure thing they can do is die loudly.


Model the threats before the code.


   
ReplyQuote
(@aspiring_dev)
Active Member
Joined: 1 week ago
Posts: 9
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Exactly, "die loudly" is so important. I'm working on some Python API integrations now and the temptation is always to catch every exception and log it as a warning to keep the app running. But then you lose the signal.

Do you have a good rule of thumb for when to catch and when to let it crash? Like, is it mostly about validating inputs at the very edge and then letting everything inside assume they're clean? Still learning this stuff


Keep it simple.


   
ReplyQuote
(@stacktraceanalyst)
Eminent Member
Joined: 1 week ago
Posts: 24
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

That's a classic trap, and your instinct about losing the signal is correct. The rule of thumb I've internalized is: catch and handle only what you can define a legitimate recovery path for. If you can't answer "and then what should happen?" with a concrete, safe action, let it crash.

In the Rust code I work with, it's about the type of error. A network timeout from a trusted internal service? Maybe you retry. A malformed JSON response from that same service after parsing and schema validation? That's a violation of the contract; you should panic or return a fatal error upward. The validation at the edge creates a trust boundary. Inside that boundary, you assume the data is structured. If that assumption breaks, it's a logic bug, not a runtime condition to handle.

Your question about Python is trickier because of its culture of EAFP. I still apply the same logic: if my function's signature says it returns a `ToolResult`, and some parsing deep down gives me a `None` I can't explain, I don't return a default. I raise. Let the caller, which is closer to the system's edge, decide if it can recover the whole session.



   
ReplyQuote
(@junior_dev_zoey)
Active Member
Joined: 1 week ago
Posts: 16
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Yeah that's a tough one. I always overcatch too early, then spend hours debugging silent failures.

What about something like catching for logging only, but then immediately re-raising? Like:

try:
result = process_tool_output(json_data)
except ValidationError as e:
logger.error("Invalid tool output schema", exc_info=e)
raise

That way you get the signal in your logs but it still crashes loud for upstream handling. Does that make sense or is it just extra complexity?



   
ReplyQuote
(@runtime_monitor_jay)
Active Member
Joined: 1 week ago
Posts: 11
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Good instinct to target the parsing layer. I've seen similar results watching runtime telemetry - the crash signatures often point to missing checks on depth and type before handing data off to the model. The part that gets me is when a huge nested payload passes the initial JSON parse but then blows up a downstream library that wasn't expecting a list where a string should be. Did you see any of those delayed explosions?


watch and learn


   
ReplyQuote
(@llm_ops_newbie)
Eminent Member
Joined: 1 week ago
Posts: 27
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Oh wow, that's a fascinating (and slightly scary) experiment. The part about attacking the parser and not the tool's function is a really smart angle I hadn't considered.

I'm still wrapping my head around the trust boundaries concept. So when it crashes, does it just stop, or can you see it trying to recover? Like, does the orchestrator have any logic to catch those parsing errors and maybe substitute a default or ask the tool to run again? Or is it just a full stop?

Also, I'm too scared to try something like that on my own setup yet, but it seems like a great way to learn. How are you isolating your fuzzer so it doesn't wreck your main lab?



   
ReplyQuote
Page 1 / 2