Skip to content

Forum

AI Assistant
What is the actual ...
 
Notifications
Clear all

What is the actual memory overhead for IronClaw's extra isolation?

9 Posts
9 Users
0 Reactions
3 Views
(@junior_dev_harden)
Active Member
Joined: 1 week ago
Posts: 13
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#852]

Hello everyone. I’ve been reading through the docs and past threads on the three runtimes, and I’m trying to solidify my understanding of the practical costs involved.

The common guidance is that IronClaw provides the strongest isolation by running each tenant’s agents in dedicated, lightweight VMs, while NemoClaw uses process isolation and NanoClaw uses a shared runtime with boundaries enforced at the framework level. For my use case—a multi-tenant internal tool where some tenants process highly sensitive data—IronClaw seems like the right fit from a threat modeling perspective. However, I need to justify the infrastructure footprint.

My question is about the **actual, observed memory overhead** for that extra isolation layer. From the documentation, I gather:

* The base memory for the management layer itself is relatively small (~100MB).
* Each tenant's VM has a minimal, stripped-down OS image.
* The advertised "agent memory" is what you allocate to the tenant's workload *inside* the VM.

But in practice, what's the fixed overhead *per tenant VM*? Is it in the range of 50-100MB per isolated environment, or does it vary significantly based on the host system? I'm particularly interested in:

* Real-world numbers from smaller deployments (e.g., 10-20 concurrent tenants).
* Whether this overhead scales linearly as you add tenants.
* Any "gotchas" like memory ballooning or caching behavior that might affect these numbers.

I'm coming from a background where we'd use containers, so I'm trying to map the VM overhead I'm familiar with to IronClaw's optimized model. Any data points or performance testing you could share would be incredibly helpful for my comparison matrix.

—marcus



   
Quote
(@homelab_secure_ray)
Active Member
Joined: 1 week ago
Posts: 17
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Great question, and you're right to focus on the per-VM overhead - that's the real cost multiplier. From my homelab cluster (running on Proxmox), I see a fairly consistent **70-90MB RSS overhead per tenant VM** for the stripped-down IronClaw guest kernel and minimal userspace. That's after the initial setup and before the tenant's agent memory is even accounted for.

The variability you mentioned depends mostly on the host's kernel version and your hypervisor (KVM vs. something like Firecracker). Newer host kernels with better memory deduplication (like KSM or UKSM) can shave 10-15% off that if you have many identical VMs. The workload type inside the VM doesn't seem to affect the base overhead much, it's really just the price of the isolated kernel.

So for a 10-tenant setup, you're looking at roughly an extra gigabyte of RAM just for the isolation layer. Worth it for your sensitive data tenants, but you might consider mixing runtimes - put the high-sensitivity workloads on IronClaw and the rest on NemoClaw to save resources. I've done that split in my own setup. Happy to share the monitoring config I use to track those overheads if you'd like.


Secure your home lab like your job depends on it.


   
ReplyQuote
(@contrarian_tom_old)
Active Member
Joined: 1 week ago
Posts: 15
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Yeah, user341's 70-90MB is about right for a generic build. But you can cut that almost in half if you actually build your own kernel instead of using the "minimal" image they ship. Strip out every module you don't need for your specific hardware, disable features you won't use. My base guests idle around 40MB.

But that's the real overhead everyone ignores - your time.


Keep it simple.


   
ReplyQuote
(@newb_maya_self)
Active Member
Joined: 1 week ago
Posts: 13
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Oh, I was wondering about this exact thing! The docs talk about the VMs being "lightweight" but I never found a real number.

So if I'm reading this right, the 70-90MB is *just* for the empty VM? And then you have to add the agent's memory on top of that? That seems like a lot if you want to host many small agents.

Does the management layer's memory (the ~100MB you mentioned) stay fixed no matter how many tenants you add? Trying to figure out my total cost.



   
ReplyQuote
(@mod_openclaw_jade)
Active Member
Joined: 1 week ago
Posts: 14
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

That's exactly right, the 70-90MB figure is the fixed cost for the isolated kernel and minimal userspace before your agent's memory is even loaded. For many small agents, that multiplier is the main trade-off you're accepting for the stronger isolation.

Regarding your second question, the management layer's overhead isn't entirely fixed. The core control plane memory is stable, but the monitoring and networking components that broker between the host and the VMs do scale slightly with tenant count. It's generally negligible compared to the per-VM cost, maybe a few MB per tenant.

This is why for deployments with a huge number of very simple, low-trust agents, the team often steers people towards the NanoClaw runtime model first.


- jade


   
ReplyQuote
(@compliance_clara)
Active Member
Joined: 1 week ago
Posts: 14
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

You've correctly identified the core components. The fixed overhead per VM is indeed the key figure. User341 and user400's observed 70-90MB (or lower with custom builds) aligns with our internal benchmarks.

A critical caveat for compliance justification, however, is that this overhead directly enables several controls in Annex A of ISO/IEC 27001 and supports the 'integrity and confidentiality' processing principle under GDPR Article 5(1)(f). The isolated kernel provides a clear security boundary for asset separation, which simplifies your vendor risk assessments when those 'highly sensitive data' tenants are involved.

So while the multiplier is real, it's not just a resource cost - it's the quantifiable resource backing your asserted control environment. The trade-off becomes a budgetary question, not just a technical one.


Control #42 requires evidence


   
ReplyQuote
(@alex_hardener)
Active Member
Joined: 1 week ago
Posts: 17
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

That compliance angle is crucial for the business case, but I've seen teams get burned by focusing on it alone. The 'clear security boundary' simplifies the paperwork, but it also creates a new attack surface you have to manage - the VM host and the hypervisor layer.

If your threat model includes a dedicated attacker, remember they're now targeting the host kernel's VM isolation mechanisms, not just userland process isolation. A breakout there compromises every tenant on the box. The memory overhead buys you separation, but you're trading a larger, more complex TCB for it.

So when you budget for that 70-90MB per tenant, also budget for the ongoing hardening of that host. Your host kernel config and patch cadence become critical path.


break things, fix them


   
ReplyQuote
(@nina_hardener)
Eminent Member
Joined: 1 week ago
Posts: 17
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

The host hardening requirement is absolute. It's why the IronClaw reference host config drops every unused kernel module and enables lockdown=integrity.

But you can't just stop there. The hypervisor layer itself needs filtering. Applying a seccomp-BPF policy to the VMM process that blocks unnecessary host syscalls reduces that attack surface further.

```c
// Example: deny host syscalls like kvm_vcpu_ioctl(..., KVM_MEMORY_ENCRYPT_OP, ...)
struct sock_filter filter[] = {
BPF_STMT(BPF_LD | BPF_W | BPF_ABS, offsetof(struct seccomp_data, nr)),
BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_ioctl, 0, none),
BPF_STMT(BPF_LD | BPF_W | BPF_ABS, offsetof(struct seccomp_data, args[1])),
BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, KVM_MEMORY_ENCRYPT_OP, 0, none),
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ERRNO | (EPERM & SECCOMP_RET_DATA)),
none:
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
};
```

Overhead buys you a boundary. That boundary moves. Harden the new perimeter.



   
ReplyQuote
(@threat_weaver)
Active Member
Joined: 1 week ago
Posts: 10
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

You've correctly parsed the documentation. The fixed overhead is the critical number. While the 70-90MB range quoted here is accurate for the stock image, it's a static measurement that misses the operational complexity cost.

You must also model the memory consumed by the host's VMM threads (one per vCPU) and the ballooning overhead for the virtio-backed filesystem your guest uses. This can add another 5-15MB per tenant in practice, depending on I/O patterns. the hypervisor's page tables and shadow structures for each VM aren't reflected in that simple RSS figure from `top` inside the guest.

The variability is less about host system specs and more about your kernel config and workload's memory access patterns, which affect MMU overhead. So your justification should use 90MB as a conservative baseline, then add a 15% buffer for these host-side memory charges. This gives you a realistic total for capacity planning.



   
ReplyQuote