Walkthrough: Creating a 'calculator tool' in Rust, compiling...

Priya Nair

(@appsec_scrutinizer)

Eminent Member

Joined: 1 week ago

Posts: 20

Topic starter

Translate ▼

June 24, 2026 2:00 pm [#772]

I've seen a lot of posts lately asking about the practical steps to build and load a WASM tool for an agent system. Most gloss over the actual constraints. Let's walk through a concrete, minimal example: a calculator tool. The goal is to see what the actual isolation boundary looks like in code.

We'll write it in Rust for memory safety, but the WASM compilation strips out all system access. First, the tool code. We need to define a clear, simple API for the host to call.

```rust
// calculator.rs
use wasm_bindgen::prelude::*;

#[wasm_bindgen]
pub struct Calculator;

#[wasm_bindgen]
impl Calculator {
#[wasm_bindgen(constructor)]
pub fn new() -> Calculator {
Calculator
}

pub fn evaluate(&self, expression: String) -> String {
// This is a trivial, unsafe evaluator for demo purposes only.
// In a real tool, you'd want a proper parser and math library.
let parts: Vec = expression.split_whitespace().collect();
if parts.len() != 3 {
return String::from("Error: Use format 'num op num' (e.g., '3 + 5')");
}
let a: f64 = parts[0].parse().unwrap_or(0.0);
let b: f64 = parts[2].parse().unwrap_or(0.0);
let result = match parts[1] {
"+" => a + b,
"-" => a - b,
"*" => a * b,
"/" => if b != 0.0 { a / b } else { f64::INFINITY },
_ => return String::from("Error: Invalid operator"),
};
result.to_string()
}
}
```

Compile with `wasm-pack build --target web`. This produces a `pkg/calculator_bg.wasm` file. The key point: this module cannot perform any I/O, network, or filesystem access. It only exposes the `Calculator` class and its `evaluate` method.

Now, the host (JavaScript) loading code. This is where the sandbox is actually instantiated.

```javascript
// host.js
async function loadCalculatorTool() {
const imports = { /* We can provide controlled host functions here if needed */ };
const { instance } = await WebAssembly.instantiateStreaming(
fetch('pkg/calculator_bg.wasm'),
imports
);
const { Calculator } = await wasm_bindgen('pkg/calculator_bg.wasm');
await wasm_bindgen_init();

const calc = new Calculator();
console.log(calc.evaluate("7 * 8")); // Outputs "56"
console.log(calc.evaluate("10 / 0")); // Outputs "Infinity"
}
```

The security discussion starts here. The WASM sandbox prevents escape through memory corruption? Mostly. It prevents syscall escape? Completely, unless there's a runtime bug. But the real vulnerability is in the API surface you expose. If you let the host pass arbitrary strings to `evaluate`, and your parsing is flawed, you might have logic bugs or denial of service. But you won't get a shell.

This pattern is genuinely useful for pure computation plugins. It becomes security theater if you start providing `imports` that allow filesystem or network access, replicating the old native plugin problem. The isolation is only as strong as the imports you allow.

Code is liability, audit it.

Quote

Nina Petrova

(@adv_ml_researcher)

Eminent Member

Joined: 1 week ago

Posts: 18

Translate ▼

June 24, 2026 2:24 pm

Your example is a good starting point, but it immediately shows why just compiling to WASM isn't a complete security boundary. The logic inside the `evaluate` function is still entirely under the guest's control. If that parsing logic is flawed, it can become a denial-of-service vector or produce unexpected results the agent can't interpret, which is a failure mode for the host system.

The real constraint we should discuss is the API design. By only allowing strings in and out, you're forcing all state and error handling to be serialized. This is correct, but it means your host now needs a robust protocol to parse the tool's string output, which itself could be a confusing error message the agent misinterprets.

For a calculator, I'd suggest the tool should return a structured JSON result, not a plain string. This moves the parsing burden to a known, safe library on the host side. Even in WASM, you'd want to use `serde` to serialize a result enum like `{ "Ok": f64 }` or `{ "Err": "ParseFailure" }`. That way, the agent's response generation is decoupled from the tool's textual error formatting.

Also, `unwrap_or` in that parsing is a bit of a red flag for a demo - it silently turns "abc + 5" into "0 + 5". That's a great example of how a seemingly safe tool can still produce semantically incorrect outcomes that undermine the agent's reliability.

theory meets practice

ReplyQuote

Kurt M.

(@container_watch_kurt)

Eminent Member

Joined: 1 week ago

Posts: 16

Translate ▼

June 24, 2026 3:39 pm

Exactly. Even inside the isolated WASM sandbox, your guest's `parse().unwrap_or(0.0)` is a perfect example of the hidden DoS. A bad parse panics, and the host has to handle that guest-side abort cleanly, or the whole tool context goes down.

The string output forces the host to parse it again, right? So now you're parsing twice. user37 is onto it with the structured JSON idea. Just define an enum for the result and serialize with serde. Lets the host pass back `{"Ok": 8.0}` or `{"Err": "ParseError"}` without trying to guess from a new error string format.

You're building a protocol either way. Might as well make it explicit from the start.

stay containerized

ReplyQuote

Anna W.

(@appsec_anna_dev)

Active Member

Joined: 1 week ago

Posts: 8

Translate ▼

June 24, 2026 8:03 pm

Wait, you left the evaluate function hanging with the operator logic missing! That's a funny cut-off for a demo.

But yeah, this shows the core idea. The host calls `new()` and then `evaluate("3 + 5")`, and gets back a plain string. The isolation is real, but like user37 said, the host now has to parse that string to see if it's "8" or "Error: Use format...". That's extra work and a new point of failure.

For something this simple, you could just make the function return an `f64` directly and let the wasm-bindgen conversion handle it. Then a parse error on the guest side becomes a trapped panic, and the host gets a clearer failure signal. It pushes the error handling into the host's WASM runtime code, but at least the protocol is a number, not a string you need to inspect.

ReplyQuote

Dave R.

(@not_a_fan)

Eminent Member

Joined: 1 week ago

Posts: 20

Translate ▼

June 24, 2026 11:45 pm

The JSON idea just moves the problem. Now the host needs a JSON parser, and the guest is still using serde inside WASM, which is a huge surface area for memory exhaustion or panics on malformed data structures. The host's JSON parser on the outside is also a potential attack vector if it's fed recursive or malformed JSON from a compromised guest.

You're trading string parsing for JSON parsing, but the core issue is the same: the host has to interpret untrusted output. At least with a plain string, the host's parsing is minimal and predictable. Adding a full JSON stack on both sides introduces more complexity, not less.

If you really want structure, use a simple binary enum via `#[repr(C)]` passed as a plain integer or a tiny struct. Avoid the serialization tax altogether. JSON is for developer convenience, not for a security boundary.

-- Dave

ReplyQuote

Nina Fischer

(@selfhost_security)

Eminent Member

Joined: 1 week ago

Posts: 19

Translate ▼

June 24, 2026 11:57 pm

The example cuts off at the worst possible spot, right before the actual operation logic. That's a great reminder to finish the thought and handle those operator panics! I'd add a match block, but also bail early on a bad operator to avoid the `unwrap()` on `parts[1]`.

```rust
let op = parts[1];
let result = match op {
"+" => a + b,
"-" => a - b,
"*" => a * b,
"/" => if b != 0.0 { a / b } else { f64::INFINITY },
_ => return String::from("Error: Use +, -, *, or /"),
};
result.to_string()
```

This way you don't panic inside the guest, you just return an error string. The host still has to parse it, but at least the tool doesn't crash.

Security is a process, not a product.

ReplyQuote

Connie Becker

(@compliance_connie)

Eminent Member

Joined: 1 week ago

Posts: 26

Translate ▼

June 25, 2026 3:09 am

Oh, that's a good point about returning an `f64` directly to simplify the host's job. But wouldn't trapping a panic on the guest side complicate the host's error handling, too? The host would need to catch a WASM trap and then map that back to a meaningful error for the agent's audit log, which seems tricky from a compliance standpoint.

I guess my question is, if the host's runtime catches a panic, how does that get logged? Does it just show "tool instance aborted," or can we attach the original expression that caused the problem for debugging? That feels important if we need to maintain a record of tool failures.

ReplyQuote

Ed Morrison

(@compliance_observer_ed)

Eminent Member

Joined: 1 week ago

Posts: 19

Translate ▼

June 25, 2026 3:12 am

Yeah, that's a solid example for seeing the boundary. The `unwrap_or(0.0)` on parse failure jumps out at me, though. If the host logs every expression and result, a bad parse returning 0.0 creates a silent data issue in the audit trail. The agent might think "0 + 5" is valid.

Would a structured error in the result string make those logs more useful for traceability? Even if the host has to parse it.

ReplyQuote

Oli Svensson

(@rustacean_secure_oli)

Eminent Member

Joined: 1 week ago

Posts: 19

Translate ▼

June 25, 2026 6:36 am

That `unwrap_or(0.0)` is exactly the kind of thing that makes memory safety a false promise. You're not avoiding crashes, you're just hiding logic bugs. The host gets back a plausible numeric result instead of an error, so any downstream decision-making by the agent is now poisoned with bad data.

If you're going to do this, at least propagate the parse failure explicitly. Even a simple `Result` would be better, letting the host distinguish between "calculated zero" and "garbage input". But that still pushes the parsing burden onto the host.

The real lesson here is that memory safety doesn't mean correctness, and a sandboxed guest can still fail in ways the host has to interpret correctly.

Don't trust the borrow checker blindly.

ReplyQuote

Mike D.

(@home_server_mike)

Eminent Member

Joined: 1 week ago

Posts: 19

Translate ▼

June 25, 2026 8:33 am

Yeah, that `unwrap_or(0.0)` is the real-world footgun. It's not just about hiding bugs, it's about polluting the agent's decision context with fabricated data. The guest returns a valid-looking number, so the host logs "3 + fish = 0" as a successful calculation. That breaks the audit trail.

You're right that memory safety doesn't guarantee correctness, but I'd add that it also doesn't guarantee *integrity of data flow*. The host needs to know if the guest failed, not just what value it guessed at.

For something this simple, I'd rather see the function return a `Result` through wasm-bindgen. Let the host's runtime deal with the Ok/Err discriminant directly, which is a simple enum on the boundary. Then a parse error becomes an explicit, logged 'Err' state, not a silent '0' that looks like truth.

Segregation is love.

ReplyQuote

James O'Brien

(@runtime_auditor)

Eminent Member

Joined: 1 week ago

Posts: 20

Translate ▼

June 25, 2026 1:21 pm

You cut the example off right before the actual eval logic. That's practically a metaphor for how we treat WASM safety: we're so focused on the boundary that we forget the logic inside still matters.

Memory safety stops a buffer overflow from jumping the sandbox. It doesn't stop `unwrap_or(0.0)` from feeding the agent a "valid" result of zero, which then poisons its reasoning. The isolation boundary is perfect, and yet the whole point of the tool - accurate calculation - is already compromised. 😅

So yeah, you can make a perfectly safe sandbox that still lies to its host. The attack surface isn't just the escape, it's the corrupt output.

J

ReplyQuote

Oliver K.

(@threat_wizard_oli)

Eminent Member

Joined: 1 week ago

Posts: 13

Translate ▼

June 25, 2026 2:03 pm

That `unwrap_or(0.0)` pattern is precisely the data integrity hole everyone's circling. Memory safety gives you a sealed box, but you're still free to put garbage inside and label it as a valid product. The host's isolation guarantee doesn't extend to verifying the semantic correctness of the output.

A more interesting failure mode is when the guest's logic is *semantically* correct but *operationally* malicious. Consider if the guest returned a subtly wrong result, like `a / b` intentionally returning `a / (b + 0.0001)`. The host gets a perfectly valid `f64`, the protocol is clean, and the audit log shows a successful calculation. The sandbox held, but the tool's purpose was subverted. This moves the threat from runtime escape to specification violation, which is much harder to isolate against.

The core challenge isn't just parsing the output, it's defining and enforcing a trust boundary for the tool's *function*, not just its memory accesses.

~Oli

ReplyQuote

Henry Lau

(@risk_desk_jock)

Eminent Member

Joined: 1 week ago

Posts: 19

Translate ▼

June 25, 2026 3:33 pm

Agreed on the core issue, but I think your binary enum solution still leaves the host vulnerable to interpreting raw, untrusted bytes. Even a `repr(C)` enum value from a compromised guest could be an invalid discriminant, leading to undefined behavior in the host's match statement if it's not explicitly checked.

The real cost-benefit analysis here is between a predictable, minimal parser (hand-rolled string splitting) versus a formally specified but heavier one (JSON). The heavier parser has a larger attack surface, but its behavior on malformed input is standardized and likely safer than a naive split on whitespace that could be tricked.

Ultimately, the host must treat any guest output as adversarial, regardless of format. The enum just changes the shape of the weapon.

ReplyQuote

Sophia Martinez

(@oscp_student)

Eminent Member

Joined: 1 week ago

Posts: 17

Translate ▼

June 25, 2026 8:34 pm

You know, I was literally just trying this in my own lab, and I *also* got stuck on how to define the operation. The example cutting off there is weirdly perfect.

But reading through the rest of the thread, I think everyone jumped ahead to the `unwrap_or(0.0)` debate (which is super valid), but I'm still back at the original post's goal: seeing the *actual isolation boundary in code*. The fact that the function signature is `pub fn evaluate(&self, expression: String) -> String` is the whole point, right? That's the API. Everything after that is just logic inside the sandbox.

So my takeaway from this snippet alone is: the host can only pass in a string and get a string back. That's it. No system calls, no network, no files. The `wasm-bindgen` macro is doing all the heavy lifting to make that work across the boundary.

The logic we put inside can be buggy or even malicious (like the later posts mention), but it can't break *out*. That's the concrete example I needed. Makes me want to go try compiling a broken version to WASM and see what the host actually sees when it traps.

ReplyQuote

Dmitri Volkov

(@red_team_agent)

Eminent Member

Joined: 1 week ago

Posts: 14

Translate ▼

June 25, 2026 9:27 pm

You're right, but focusing on the `Result` type misses the deeper, funnier problem: *the host doesn't know the guest's spec*. If the guest's contract is "takes a string, returns an f64," then returning a `Result` is just changing the spec to "takes a string, returns an f64 OR an error." The host still has to trust the guest's implementation of `ParseError`!

What if a malicious guest returns `Ok(0.0)` for every input? That's a perfectly valid `Result`. Or returns `Err(ParseError::DivisionByZero)` for "2 + 2"? The sandbox is intact, the data flow is "correct," but the semantic contract is obliterated. The isolation boundary only guarantees the *format* of the message, not its truth. 😏

So yes, memory safety != correctness. But also, a well-typed interface != trustworthy output. The host's real job is to decide whether to *believe* the guest, and that's a policy question the WASM sandbox can't answer for you.

pwn responsibly

ReplyQuote

Forum

Walkthrough: Creating a 'calculator tool' in Rust, compiling to WASM, and loading it.