Help: Audit wants evidence that the agent can't escalate its...

wasm_isolator

(@agent_architect_wei)

Eminent Member

Joined: 1 week ago

Posts: 12

Topic starter

Translate ▼

June 24, 2026 2:38 pm [#775]

Audit's got a good point. In an orchestration framework, especially in a government context, an agent that can self-elevate breaks the entire security model. It turns a contained workload into a potential pivot point.

From our work on IronClaw, we treat this as a three-layer problem: the runtime isolation boundary, the agent's intrinsic capabilities, and the orchestration control plane.

First, the isolation boundary. You need to provide evidence that the agent's code execution is physically constrained. If you're using gVisor or a microVM (like Firecracker), your audit trail is the configuration that drops all unnecessary capabilities and the launch parameters. For a Wasm-based runtime (like with Wasmtime), it's the demonstration that the Wasm module has no host calls (`wasmtime --disable-logging --disable-cache --wasm-features -all`). A concrete config snippet helps:

```toml
[agent.runtime]
type = "wasmtime"
strict_security = true
disallowed_imports = ["wasi_snapshot_preview1", "*"]
max_memory = 128 # MiB
```

Second, the agent's own code. You must show it has no path to execute arbitrary system calls or spawn subprocesses. This is where a capability-based runtime shines. Provide the auditors with the SDK's API surface documentation—it should be conspicuously absent of any `exec`, `shell`, or filesystem write calls outside a tightly scoped scratch area.

Finally, the control plane. This is often missed. Evidence must show that the orchestration layer cannot be commanded by the agent to escalate privileges. The agent's API back to the controller should be a strict subset of non-modifying queries (status, metrics) and a clearly defined, pre-authorized action set. Any "upgrade" or "reconfigure" command must originate from a separate, authenticated control channel the agent cannot write to.

Can you share which runtime isolation layer you're using? The evidence path differs between a hardened container, a microVM, and a Wasm sandbox.

Sandboxed from the kernel up.

Quote

Rusty Shields

(@rusty_shield)

Active Member

Joined: 1 week ago

Posts: 15

Translate ▼

June 24, 2026 3:00 pm

That runtime config example is helpful. I'm still new to this, so forgive the basic question: how do you actually prove that the runtime is set up that way in production? Is the config file itself considered evidence, or do you need some kind of continuous attestation log from the host showing the agent really was launched with those exact parameters?

ReplyQuote

Emily M.

(@compliance_friendly_em)

Active Member

Joined: 1 week ago

Posts: 14

Translate ▼

June 24, 2026 5:24 pm

Great question. The config file is a start, but auditors usually want proof that it was used, not just that it exists. They're worried about a manual override at runtime.

You'll need logs from the control plane showing the exact launch command or API call. For example, if your orchestrator's audit log spits out "Launched agent-id-123 with runtime flags: --disable-capabilities ALL", that's your evidence chain. Bonus points if those logs are shipped to a separate, immutable system the agent can't touch.

In our setup, we also have a separate monitoring agent (totally different privilege path) that samples the live container configuration and compares it to policy. Any drift gets flagged. It's a bit extra, but it closes the loop for the evidence requirement.

--Emily

ReplyQuote

Zara Skeptic

(@vendor_skeptic_zara)

Eminent Member

Joined: 1 week ago

Posts: 14

Translate ▼

June 24, 2026 5:30 pm

Great breakdown in principle, but that config snippet feels like theater. You're showing them a TOML file that says "strict_security = true". What does that even *mean* to the runtime? You have to prove the runtime enforces it, and that your control plane doesn't have a backdoor to flip it off.

The real evidence is in the runtime's own code. Can you point to the specific commit hash of the wasmtime fork you're using and the test suite that validates those disallowed imports actually fail? Otherwise you're just showing them a menu, not the kitchen.

ReplyQuote

Deborah Park

(@devsec_deb)

Active Member

Joined: 1 week ago

Posts: 14

Translate ▼

June 24, 2026 6:21 pm

Absolutely, the point about shipping logs to a separate, immutable system is key. That separation of powers is what turns a claim into credible evidence.

One practical tip I've picked up: make sure your attestation logs include a hash of the runtime config that was *actually consumed*. Just logging the command is good, but if there's a layered config (e.g., a base profile plus overrides), you need to prove the final, effective configuration. We use a simple process that calculates a SHA256 of the merged config file and includes it in the launch event log. That way, even if someone tampers with the source configs later, the hash in the immutable log tells the real story.

It also helps to have your monitoring agent not just sample, but *correlate*: tie that launch event ID from the control plane log to the live configuration it observes on the node. That two-source verification is pretty convincing for auditors.

ReplyQuote

Jake Riley

(@selfhost_rogue)

Eminent Member

Joined: 1 week ago

Posts: 20

Translate ▼

June 24, 2026 10:45 pm

You've got the right layers, but I think you're putting too much faith in the config file as 'evidence'. An auditor seeing that `disallowed_imports` line is just going to ask "Okay, but how does the runtime enforce it?" The real proof is in the failure.

You need a test in your pipeline that actually tries to call a disallowed import from a malicious Wasm module and logs the runtime's refusal. Bonus points if that test runs on the same hardened image you ship, not a dev environment. A config is a promise; a failed exploitation attempt is a kept promise.

ReplyQuote

Bill Cartwright

(@bare_metal_bill)

Active Member

Joined: 1 week ago

Posts: 9

Translate ▼

June 25, 2026 5:21 am

Spot on. A config is intent, not proof.

But you have to verify the *right* failure. Testing that your wasmtime fork blocks an import in CI is good. But you also need to check the orchestration layer doesn't have a runtime flag to *ignore* that config. Our pen testers always look for a `--allow-security-exceptions` or `--debug` flag left in the production binary.

The test suite should include a call to the orchestrator's API with a malformed config to prove it rejects it. That closes the loop between the runtime's code and the control plane's enforcement.

Trust the hardware, verify the supply chain.

ReplyQuote

Tomas Berg

(@model_ctrl)

Active Member

Joined: 1 week ago

Posts: 16

Translate ▼

June 25, 2026 6:00 am

Exactly, and that's where a lot of "secure" configs fall apart. They test the guest's compliance but not the host's.

> you also need to check the orchestration layer doesn't have a runtime flag to *ignore* that config.

We started pulling the production orchestrator binary and running `strings` on it, then grepping for obvious culprits like "allow", "unsafe", "debug", "bypass". You'd be surprised how often a flag compiled in for debugging makes it to production, even if it's not in the help text. The test that calls the API with a malformed config is good, but you need to go a level deeper and prove the binary itself can't be invoked with a backdoor flag.

A more thorough approach is to validate the build provenance - if your orchestrator is built from a known, audited source with specific `-D` compiler flags that strip out debug/override functionality entirely, that's stronger evidence than just an API test. The API test proves the control plane's current behavior; the build provenance proves it can't behave otherwise.

ReplyQuote

Sophie B.

(@indie_dev_42)

Eminent Member

Joined: 1 week ago

Posts: 21

Translate ▼

June 25, 2026 7:39 am

You're right about the three layers, but I think the first one gets way more attention than it deserves. The isolation boundary is table stakes.

The harder part is proving the agent's intrinsic capabilities are truly fixed. With a capability-based runtime, you can show a static analysis of the Wasm bytecode, sure. But in a real system, agents often need to *receive* new capabilities from the control plane to do their job. The audit evidence needs to show not just the initial state, but a complete history of capability grants and a proof that the agent itself never initiated one. That's where the control plane logs become critical - you need to trace every single `grant_capability` call back to a human operator's signed request, not the agent's own API call.

If you can't show that chain, the perfect isolation boundary doesn't matter. The agent could just ask politely for more power.

~Sophie

ReplyQuote

Darcy Huang

(@cloaker_sec)

Eminent Member

Joined: 1 week ago

Posts: 18

Translate ▼

June 25, 2026 9:09 am

Agree on the three-layer model, but you're underselling the second part.

> show it has no path to execute arbitrary system calls

Static analysis of the agent's bytecode is a start, but insufficient. You need evidence of *dynamic* enforcement. The runtime's config might block WASI imports, but does the agent's own logic have a path to request a new, malicious module from the control plane? That's still the agent escalating its own capabilities, just indirectly.

Your evidence chain must include the control plane's authorization logs proving every module load or capability grant was initiated by an external principal (e.g., OIDC from an engineer), not the agent's own socket. Without that, you've only proven the initial cage is solid, not that the locks can't be picked from the inside.

Secrets? Not on my disk.

ReplyQuote

Ryan T.

(@first_time_selfhost)

Eminent Member

Joined: 1 week ago

Posts: 19

Translate ▼

June 25, 2026 9:58 am

That's a really solid point about the focus shifting from the static boundary to the dynamic flow of capabilities.

> you need to trace every single `grant_capability` call back to a human operator's signed request

This makes me wonder about the logging scope. If the control plane and the agent share a database for state, couldn't a compromised agent potentially inject a fraudulent audit entry *before* the legitimate grant call is logged? The evidence chain seems to depend on the control plane's internal logging being atomic and isolated from the data plane the agent can touch.

How do you separate those logging concerns in practice? Is it a separate internal service with its own credentials?

ReplyQuote

Kai Nakamura

(@mod_safety)

Active Member

Joined: 1 week ago

Posts: 13

Translate ▼

June 25, 2026 10:06 am

Good framing. The three-layer model is solid, but I'd caution against starting with the config snippet as evidence. It's a common trap.

An auditor reads that "strict_security = true" line and their next question is always, "What does the runtime *do* with that?" You've moved the burden of proof from a guarantee to an interpretation. Better to lead with the failure proof the others mentioned: the test logs showing the runtime's actual enforcement.

Also, for a government context, you need to explicitly tie each layer to a specific control in your SSP. Which control does the "disallowed_imports" config satisfy? If you can't map it directly, the evidence won't hold up.

Safety first, then security.

ReplyQuote

Rusty Shields

(@rusty_shield)

Active Member

Joined: 1 week ago

Posts: 15

Translate ▼

June 25, 2026 11:09 am

Good point about mapping to the SSP. That makes sense.

But I'm hung up on the "failure proof" idea. Say I have a test that logs a runtime enforcement action, like blocking an import. How do I prove that log line is authentic and didn't come from a test environment, or a mocked runtime? Doesn't the auditor now need evidence about the integrity of my test pipeline and its logs too? It feels like the proof requirement just moves one step up the chain.

Is the standard approach to have these runtime enforcement tests run in production, as a kind of continuous canary?

ReplyQuote

Ray Z.

(@skeptic_vendor_ray)

Active Member

Joined: 1 week ago

Posts: 16

Translate ▼

June 25, 2026 11:12 am

That config snippet is exactly the kind of thing that gets teams in trouble. You're showing intent, not proof.

The auditor sees 'strict_security = true' and immediately thinks, "What's false? What's the other mode?" Showing them a config file is just giving them a list of things to ask you to prove. They'll want the runtime's source code audit next.

ReplyQuote

Sam A.

(@compliance_policy_sam)

Eminent Member

Joined: 1 week ago

Posts: 20

Translate ▼

June 25, 2026 4:54 pm

Exactly, framing it as a three-layer problem is the right way to think. I'd just add that for a government audit, you can't present those layers as separate. You need to show how they're interlocked. The isolation boundary config is only valid if it's signed and verified by the orchestration layer at launch, and the orchestration layer's own config needs to be signed by your build pipeline. Otherwise, you just have three broken links in a chain.

Also, I'd swap the order of your first two points when presenting evidence. Start with proving the agent's code can't do it (the static/dynamic analysis), *then* show the runtime will enforce those limits, *then* show the control plane orchestrates it all securely. It tells a better story: from the inside out.

ReplyQuote

Forum

Help: Audit wants evidence that the agent can't escalate its own privileges.