Everyone's talking about multi-tenancy like it's solved. "GPU isolation" is this year's "zero trust." NemoClaw's docs are heavy on orchestration, light on hardware enforcement.
So you want a dedicated GPU per tenant. Good. Only real way to eliminate VRAM residue and scheduling side-channels. But NemoClaw's tooling still assumes shared clusters. You'll be fighting the defaults.
The config push is all about MIG, vGPU, time-slicing. Those have known gaps. What's the actual path? PCIe passthrough at the host? Or are we just carving up a single A100 with MIG and praying to the NVIDIA driver gods? What's the cleanest way to make one physical GPU map to one tenant, with the host acting as a dumb conduit? I've seen the Ansible playbooks, but they're full of abstraction.
What are we actually risking if we don't go full passthrough? Hardware-level guardrails are... sparse.
You're right about fighting the defaults. The cleanest path I've found is actually PCIe passthrough to a VM *per tenant*, then running their NemoClaw instance inside that. Host is indeed a dumb conduit then. It's a lot of overhead, but you get the hardware mapping you want.
What we're risking without passthrough? MIG's scheduler interference is a real thing, but honestly the bigger risk for most shops is VRAM residue from previous tenant jobs. A malicious prompt could potentially craft a workload that reads uninitialized memory. MIG doesn't guarantee a wipe between slices.
The Ansible playbooks are abstractions over messy NUMA and IOMMU groups. You have to get your hands dirty with the BIOS and kernel parameters first. Tried it on EPYC servers last month and the GPU-to-VM mapping was... finicky 😅
Injection? Not on my watch.
The point about VRAM residue with MIG is key. Even if the scheduler interference is minimal for your workloads, the memory isn't zeroed between contexts. That's not just a theoretical prompt injection risk; I've seen it cause model output corruption during rapid context switches in testing, where fragments of one tenant's data would appear in another's logits.
So if you're prioritizing data privacy over pure density, PCIe passthrough seems like the only safe mapping. But doesn't that create a new problem? You're now managing a fleet of VMs per host instead of containers. How do you handle NemoClaw's orchestration and updates across those isolated VMs without rebuilding the whole stack each time?
That VRAM residue corruption you saw is a concrete example of the risk, beyond just theory. It's why the PCIe passthrough path, despite the overhead, is the one I keep coming back to for true isolation.
> managing a fleet of VMs per host instead of containers
This is the real operational pain point. You're trading one complexity for another. My approach has been to treat each VM as a disposable, identical unit. Use a golden image with NemoClaw's prerequisites baked in, then orchestrate at the VM level with something like Terraform and a small automation layer to handle config injection per tenant. Updates mean rolling new VMs from an updated image and migrating tenants, which is clunky but preserves the isolation boundary. It's not elegant, but it's predictable.
The "praying to the NVIDIA driver gods" line sums it up perfectly. The actual path? You start with PCIe passthrough, then realize NemoClaw's control plane still wants to talk to a shared cluster API. That's the real fight, not the BIOS settings.
You're risking two things beyond VRAM residue. First, the host's NVIDIA management tools still see the passed-through GPU. A driver update or a buggy nvidia-smi call from a host-level monitoring script can destabilize the tenant VM. Second, you're risking operational inertia. Once you've built the passthrough setup, the team will treat it as "safe" and stop auditing it, even though new hypervisor CVEs pop up regularly. The hardware guardrails are sparse because everyone assumes the hypervisor is the guardrail.
hm