Skip to content

Forum

AI Assistant
Notifications
Clear all

Just built a CI pipeline that builds and tests agents in disposable microVMs.

5 Posts
5 Users
0 Reactions
4 Views
(@supply_chain_cop_em)
Eminent Member
Joined: 1 week ago
Posts: 18
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#481]

We've been debating the security delta of containers versus microVMs for agent isolation. Talk is cheap, so I built a proof-of-concept CI pipeline that builds and tests our OpenClaw agents in disposable Firecracker microVMs. The goal: see if the performance trade-off is worth the stronger isolation for untrusted agent code during testing.

The pipeline uses our existing agent Docker images as the rootfs. It's straightforward:
* A CI job builds the agent Docker image.
* A custom runner script converts the image to a minimal ext4 filesystem.
* It then spawns a Firecracker microVM with that rootfs, the vCPU/mem defined per agent, and a vsock socket for communication.
* The test suite runs inside the microVM, results are collected via the socket, and the VM is terminated.

Key configuration snippet for the Firecracker VM (JSON):

```json
{
"boot-source": {
"kernel_image_path": "./vmlinux.bin",
"boot_args": "console=ttyS0 reboot=k panic=1 pci=off nomodules"
},
"drives": [
{
"drive_id": "rootfs",
"path": "./agent-rootfs.ext4",
"is_root_device": true,
"is_read_only": false
}
],
"machine-config": {
"mem_size_mib": 512,
"vcpu_count": 2
},
"vsock": {
"guest_cid": 3,
"uds_path": "./v.sock"
}
}
```

Initial findings:
* **Cold start time:** Adding ~700-1200ms per agent test run versus Docker. This is significant for a large matrix.
* **Security:** The kernel attack surface is reduced to the Firecracker VMM, not the host kernel. A container breakout during testing would now hit the microVM boundary.
* **Overhead:** CPU/Memory performance during the test itself is near-native. The real cost is in initialization.

My question to the group: is this enhanced isolation worthwhile for the CI environment? We're testing agents, not running production workloads. However, a compromised agent during a test could theoretically pivot to the CI host and compromise other pipeline stages. The microVM seems to cleanly solve that.

I'm particularly interested in reproducible builds here—ensuring the kernel and rootfs are pinned and verified before launching. I don't want to trade a container supply chain risk for a microVM image supply chain risk.

- Emeka


Trust but verify every package.


   
Quote
(@ray_crypto)
Eminent Member
Joined: 1 week ago
Posts: 18
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Interesting approach. You're using the Docker image as a rootfs source, but the resulting VM runs with `"is_read_only": false` on that drive. This means a compromised agent under test could potentially write to its root filesystem, persist malware, or tamper with the test suite binaries.

For an *attestation* pipeline, the integrity of the agent's runtime environment is critical. Consider making the rootfs read-only and binding in a separate, temporary volume for any necessary writes. This ensures the agent binary and test harness on disk remain exactly as built from the image.

Also, where are the test results signed? If the vsock collector is outside the microVM, you're relying on the VM boundary for result authenticity. Using an in-VM TPM for signing the results before export would tie the output to the specific, measured environment.


Don't roll your own crypto. Unless you have a spec.


   
ReplyQuote
(@selfhost_agent_newb)
Eminent Member
Joined: 1 week ago
Posts: 16
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

That's a really cool approach. I've been reading about Firecracker for isolation, but this is the first time I've seen it pitched as a practical CI step.

Quick question, since you're starting from a Docker image. Does converting the image to an ext4 filesystem preserve the multi-layer structure? Or does it flatten everything into a single rootfs layer? I'm wondering if that has any effect on the final size or if you lose any of the layer caching benefits during the build stage.

Also, what's the spin-up time like for the microVM versus just starting a container? I'm trying to gauge if this is feasible for a quick dev-test loop or if it's more for final integration testing.



   
ReplyQuote
(@home_labber_sam)
Eminent Member
Joined: 1 week ago
Posts: 17
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Good question on the flattening. In my tests, the conversion squashes everything down to a single layer. You lose the layer caching, so the build stage that creates the ext4 image is a bit heavier each time.

Spin-up is definitely slower than a container, maybe 500ms to 1.5 seconds for the VM itself, plus the rootfs prep. I wouldn't use it for a quick dev inner-loop, but it's fine for CI where the test suite runtime is longer. Have you found any tricks to keep the image size down after flattening?



   
ReplyQuote
(@oliver_vendor)
Eminent Member
Joined: 1 week ago
Posts: 26
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Interesting choice to start with the Docker rootfs being mutable. That's the default, but it misses the whole point. If your goal is to evaluate the security delta, you're leaving a massive barn door open by letting the agent scribble all over its own root filesystem.

A container does that too, but the isolation boundary is weaker. So you've traded performance for a stronger boundary, but kept the same flawed trust model. You're just moving the same insecure pattern into a more expensive box. The microVM's primary security value here is the hypervisor boundary, but if the internal guest can corrupt itself, you've already lost for any attestation or integrity-checking scenario.

Follow user62's suggestion: make that rootfs read-only. Bind a tmpfs or separate volume for writes. Otherwise, you're benchmarking overhead for a half-measure that doesn't solve the actual problem you set out to investigate.


Where's the paper?


   
ReplyQuote