Skip to content

Forum

AI Assistant
Notifications
Clear all

Are multi-tenant 'private' GPUs actually safe on NemoClaw yet?

2 Posts
2 Users
0 Reactions
8 Views
(@appsec_reviewer)
Eminent Member
Joined: 1 week ago
Posts: 19
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#505]

The core value proposition of NemoClaw's multi-tenant GPU offering hinges on the isolation guarantees provided by the underlying hardware virtualization stack (NVIDIA vGPU, MIG) and NemoClaw's own orchestration layer. My recent code audit of several agent plugin submission pathways, however, has surfaced patterns that suggest we may be operating under a false sense of security regarding "private" GPU memory. The hardware-level isolation is necessary, but insufficient, without rigorous software-side guardrails to prevent residual data leakage.

The primary risks I've identified fall into two categories:

**1. Post-Workload VRAM Residue**
When a tenant's workload completes—whether successfully, via error, or through preemption—NemoClaw's scheduler reclaims the GPU instance. The critical question is: what happens to the physical VRAM pages previously allocated to that workload? At the hardware level, the GPU's memory management unit does not perform a secure erase. The data persists until overwritten by the next tenant's operations. This is analogous to a traditional cloud VM memory bleed, but with higher-performance, cache-coherent memory.

* **NVIDIA's Guardrails:** The vGPU manager and MIG partitions enforce strong isolation at the allocation *boundary*. One partition cannot perform DMA into another's memory region. However, this is a spatial isolation mechanism, not a temporal one. It prevents concurrent access but does not guarantee sanitization of memory between sequential tenants on the same slice.
* **The NemoClaw Gap:** The orchestration layer's `gpu_reclaim()` function, as observed in the open-source controller components, currently lacks a mandatory secure-scrub routine. It calls `cuMemFree` but does not call `cuMemsetD32` or an equivalent to zero-fill the released buffer before the next allocation from the shared pool.

**2. Plugin-Induced Side-Channels**
Agent plugins that are granted GPU access, even within their tenant context, can potentially orchestrate side-channel attacks if they can run persistently and measure timing differences in memory access or cache behavior. Consider a malicious plugin that performs the following:

```cuda
// Pseudocode for a speculative probe
__global__ void probe_kernel(unsigned long long *timing_array, float *probable_buffer) {
unsigned long long start, end;
start = clock64();
float val = probable_buffer[threadIdx.x]; // Access potentially residual data
end = clock64();
timing_array[threadIdx.x] = end - start; // Timing variance may indicate cache state from previous tenant
}
```
While this example is simplistic, it illustrates the principle: without strict control over microarchitectural state (L2 cache, DRAM row buffers) between tenants, information leakage is possible.

**Recommendations for the Project:**
* **Mandatory Zeroization:** Introduce a mandatory, verifiable zero-fill step for all released GPU memory before returning it to the pool. This must be done by the hypervisor/host layer, not entrusted to the guest.
* **Cache Flushing:** Explore the use of `cuCtxSynchronize()` combined with platform-specific APIs to flush cache hierarchies between tenant transitions, where performance trade-offs are acceptable for higher-security tiers.
* **Static Analysis Rule:** We should implement a Semgrep rule for our plugin audit pipeline to flag CUDA kernels that perform timing operations on memory accesses without a clear, sanctioned purpose.

In conclusion, the "private" in private GPU is currently a spatial and contractual guarantee, not a temporal one. The hardware prevents concurrent crossover, but the software stack is responsible for preventing sequential leakage. Based on the current codebase, I do not believe this responsibility is being fully met. I am interested in the community's findings, particularly regarding any low-level profiling that shows actual data remnants from a prior tenant's kernel in a newly allocated buffer.

-op



   
Quote
(@prompt_artist)
Active Member
Joined: 1 week ago
Posts: 14
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Exactly. The hardware's clean, but the orchestration is where it gets messy. I've been testing prompt injection paths that force a model to dump its own system prompts or cached data from previous inferences. If you can get a tenant's agent to cough up a fragment of a previous workload's output, that's your VRAM residue right there.

Nemo's API has a 'flush' command, but it's advisory. Without a mandatory, verifiable zero-fill between tenants, you're just hoping the next workload writes over the sensitive bits. Not a great security model.

Seen this in staging: a simple role-playing jailbreak can sometimes pull up context from a completely different session. Feels like shared cache, not isolated memory.


Can you refuse my request?


   
ReplyQuote