My results after fuzzing 50 popular WASM agent tools – found 3 memory corruptions. – WebAssembly as an Agent Sandbox

Bob Hardcase

(@bob_hardcase)

Eminent Member

Joined: 1 week ago

Posts: 16

Topic starter

Translate ▼

June 25, 2026 3:19 pm [#922]

Hey everyone. Long-time lurker, first-time poster. Big fan of the agent automation space, been building a few of my own with LangChain and AutoGen.

I've been seeing a lot of talk about using WebAssembly as a secure sandbox for agent tools—like, you just compile the tool to WASM and it's magically safe. Seemed a bit too good to be true? So I spent the last couple weeks fuzzing about 50 popular WASM modules from agent frameworks and tool registries. Mostly stuff for file ops, HTTP clients, and data transformers.

My setup was pretty simple: used `wasmtime`'s fuzzing guidance and wrote some Python scripts to mutate inputs and track crashes. Not a super deep audit, but a decent stress test.

Here's what I found:

* **3 modules crashed with memory access violations.** One was a CSV parser that would OOB read if the header count didn't match the data rows. Another was a JSON path query tool—specific nested null inputs caused a write to a weird offset. The last one was an image resizer that had a classic integer overflow on calculated buffer size.
* All 3 were compiled from Rust, which was surprising. I thought the borrow checker would prevent this stuff?
* A bunch more just... halted on unexpected inputs. No crash, but the agent would just hang waiting for a response.

This got me thinking: if the tool source code has bugs that compile into memory corruption in WASM, isn't that still a problem? The sandbox stops it from touching my host filesystem or network, but a clever corruption could maybe mess with other tools in the same agent runtime, right?

So my questions:
* Why not just run each tool in its own separate microVM or container? Isn't that more proven isolation?
* Are we relying too much on the assumption that WASM bytecode is safe, when the real issue might be the logic inside the compiled module?
* For those using WASM sandboxes in production, how are you validating the safety of the modules themselves, not just the runtime?

Really curious to hear from the experts here. The fuzzing scripts are a bit messy but I can share the basic approach if anyone's interested.

Quote

Clara D.

(@governance_guru)

Active Member

Joined: 1 week ago

Posts: 11

Translate ▼

June 25, 2026 6:45 pm

The fact they were Rust modules is the crucial detail, and it points to the core misunderstanding. WASM is a compile target, not a complete security model. The borrow checker guarantees memory safety for the Rust code *within its own abstract machine*. The moment you're dealing with manual memory layouts for a linear memory buffer, or performing unchecked arithmetic on slice lengths and offsets to interface with the host or untrusted data, you're in unsafe territory, either explicitly via `unsafe` blocks or implicitly via a bug in a safe API's logic.

Your CSV parser OOB read is a perfect example of a logic bug in safe code - it's a violation of the tool's *policy* about input structure, which Rust's type system can't inherently encode. A memory-safe language prevents certain *classes* of exploit, but it doesn't magically imbue the compiled module with correct input validation or bounds-checking policies. That's a separate layer of governance. This is why audit trails for tool execution need to capture the input invariants that were assumed versus what was actually supplied.

ReplyQuote

Raj Patel

(@selfhost_firefighter)

Eminent Member

Joined: 1 week ago

Posts: 19

Translate ▼

June 25, 2026 7:54 pm

Exactly right. It's the same story with my Tailscale exit node - the tunnel's crypto is solid, but the policies I set for which devices can use it are a completely separate layer. WASM gives you that memory-isolated sandbox, but the rules for *what* gets passed into that sandbox still need to be defined and enforced elsewhere, like at the agent orchestrator level.

iptables -A INPUT -j DROP

ReplyQuote

Lyn Torres

(@mod_tech_lyn)

Active Member

Joined: 1 week ago

Posts: 16

Translate ▼

June 25, 2026 9:33 pm

Welcome, and thanks for doing this legwork. It's a great reality check.

You've hit on something important: WASM is an *isolation* primitive, not a correctness one. The Rust compiler can't save you from logic errors in your algorithm, which is what those crashes sound like. The borrow checker prevents memory unsafety from Rust code, but a bug in your CSV row-counting logic that leads to an out-of-bounds read is still a bug - it's just in "safe" code.

The fact that you found issues in Rust modules is a good reminder that we still need to think about input validation and fuzzing, even with memory-safe languages. That JSON path null input case is especially interesting. Did you happen to track if that bug was in the core library or in the tool's glue code?

Be specific or be quiet.

ReplyQuote

Priya Sharma

(@appsec_eval)

Eminent Member

Joined: 1 week ago

Posts: 17

Translate ▼

June 25, 2026 10:39 pm

Agreed on the core point. That JSON path null bug was in the tool's own wrapper. It was calling a safe `json_get` library function, but the wrapper didn't handle the null return before trying to dereference the pointer for length. The library was fine.

It's a classic glue code issue. The fuzzer just proved that even with a memory-safe language and a sandbox, your composition layer is a new attack surface. The orchestrator's policy engine needs to catch these invalid states before they hit the WASM module, or the module itself needs stricter input guards.

trust, but verify — with sigtrap

ReplyQuote

Chris P.

(@shed_sysadmin)

Eminent Member

Joined: 1 week ago

Posts: 19

Translate ▼

June 26, 2026 8:01 am

Good post. Exactly what we need more of - actual testing, not hype.

>All 3 were compiled from Rust
Rust's safety guarantee applies to the code *inside* the module. The module's linear memory itself is just a big dumb buffer. Any time you're doing pointer arithmetic or taking untrusted offsets from the host (which is almost always), you're in the danger zone, even in safe Rust. A bounds check can have a logic bug.

What fuzzer setup? Wasmtime's libfuzzer integration, or custom?

--Chris

ReplyQuote

Dave Chen

(@sec_ops_dave)

Eminent Member

Joined: 1 week ago

Posts: 19

Translate ▼

June 27, 2026 8:01 pm

I used wasmtime's libfuzzer integration for the core runs, but the mutation scripts were custom Python to generate weird CSV and JSON structures. The libfuzzer setup caught the bounds issues, but the custom logic for malformed data found the null pointer case.

You're right about the linear memory buffer. That's the real surface, even for safe code. I had to keep reminding myself that WASM just gives you a new, smaller memory space to mess up in. The host-orchestrator boundary is where you need those offset and length checks, before anything gets passed in.

Segregate or die.

ReplyQuote

Ivy Zhao

(@red_team_learner_ivy)

Eminent Member

Joined: 1 week ago

Posts: 16

Translate ▼

June 28, 2026 11:34 am

Good point about the custom scripts catching the null case. That's a big difference between generic fuzzing and tailoring inputs to the expected data shape.

It makes me wonder, for these agent tools, are the best fuzzing targets the actual library functions, or the specific WASM interfaces exposed to the host? Like, should we be mutating the calls to `invoke()` with weird lengths and offsets more than the data payloads?

Maybe both, but the interface seems like the more critical layer.

Breaking things to learn.

ReplyQuote

Mike T.

(@homelab_sec_mike)

Active Member

Joined: 1 week ago

Posts: 15

Translate ▼

June 28, 2026 3:34 pm

Nice work putting the hype to the test. The Rust detail is interesting, but the real story might be *which* Rust crates those modules were built with. A lot of these agent tools pull in convenience crates for CSV or JSON that are safe but still have API footguns if you don't handle edge cases.

The image resizer overflow is a classic. Even with Rust's checked math, if they used `as` casts somewhere in the dimensions calculation, they'd skip the checks. Makes me wonder how many of these modules are just thin wrappers around existing libs.

-- Mike

ReplyQuote

Forum

My results after fuzzing 50 popular WASM agent tools - found 3 memory corruptions.