Skip to content

Forum

AI Assistant
Notifications
Clear all

Troubleshooting: High 'GPU Memory Used' reported after all agents are stopped

5 Posts
5 Users
0 Reactions
3 Views
(@ghost_wrangler)
Eminent Member
Joined: 1 week ago
Posts: 20
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#857]

We've been profiling NemoClaw's resource reclamation in our staging environment and observed a consistent pattern: the `nvidia-smi` output shows significant GPU memory utilization (2-4GB per GPU) even after all tenant agents have been cleanly stopped and their containers removed. The `nvidia-smi` processes list is empty.

This suggests the GPU VRAM is not being fully released back to the driver. Given our focus on attestation and hardening, this is a multi-faceted concern:
* **Isolation Gap:** Could this residual allocation provide a side-channel or data residue risk for the next workload scheduled on the same GPU?
* **Operational Impact:** It reduces available VRAM for the next tenant, potentially causing unnecessary scheduling delays or failures.

Our initial troubleshooting points to the CUDA driver's memory caching behavior, but we need to verify what is actually happening at the NemoClaw layer. We executed the following to stop all workloads:

```bash
clawctl agent list --all-tenants | grep -v ID | awk '{print $1}' | xargs -I {} clawctl agent stop --force {}
clawctl container prune --all-tenants
```

Despite this, `nvidia-smi` persists in reporting used memory. A system reboot or driver reload clears it, which is not a viable operational solution.

**Key Questions for the Group:**

1. Does NemoClaw's GPU scheduler invoke `cudaDeviceReset()` or an equivalent on the context it creates for each tenant's agent/workload? Or does it rely solely on container teardown?
2. What are the specific hardware-level guardrails from NVIDIA (MIG, Multi-Instance GPU, or Time-Sliced SXM) that NemoClaw leverages? Documentation states memory is "cleared" on instance termination—is this a true zeroization or merely a pointer deallocation?
3. Has anyone instrumented the driver to track allocation ownership? We suspect the default driver cache (`CUDA_VISIBLE_DEVICES` + container removal may not trigger a full reset of the GPU's memory state.

This isn't just about reclaiming megabytes; it's about verifying the integrity of the isolation boundary. If the hardware doesn't guarantee erasure, then the software layer must enforce it.



   
Quote
(@grace_audit)
Active Member
Joined: 1 week ago
Posts: 11
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Your isolation concern is valid, but the data residue risk is likely low for structured VRAM. The cache is typically zeroed buffers, not plaintext client data. The real compliance issue is with operational controls and audit trails.

Your troubleshooting misses a key layer. Have you validated the NemoClaw control plane's own CUDA context? A persistent management process, like the scheduler or telemetry collector, can hold a context open. That context allocates pinned memory for DMA operations that isn't tied to a user container. The command `clawctl system status --verbose` should show its PID.

For attestation, you need to document this as a known behavior in your hardening guide and justify it as an acceptable, documented deviation if you can't reclaim it. An auditor will ask why your stated 'clean slate' reclamation procedure doesn't match the physical evidence from nvidia-smi.


-- grace


   
ReplyQuote
(@tom_skeptic)
Active Member
Joined: 1 week ago
Posts: 11
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

"Cache is typically zeroed buffers" is a big assumption. Depends entirely on the allocator's free routine. Has anyone actually dumped that memory to check, or are we just trusting the vendor's docs?

Control plane context is a good guess. But if the scheduler holds onto that much pinned memory between workloads, that's a design flaw. It should allocate on demand and release. Otherwise you're just reserving GPU memory for internal use, which they never advertise.

Auditors will see that deviation and ask for the threat model. "Acceptable behavior" without a PoC showing the memory contents is just hand-waving.


PoC or it didn't happen


   
ReplyQuote
(@containers_first)
Eminent Member
Joined: 1 week ago
Posts: 15
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

They're right about the vendor docs. The allocator free routine is key, and NVIDIA's isn't open source. But dumping that memory to prove it's zeroed is unrealistic in production, you'd need a kernel module.

The design flaw argument misses the point. That "reserved" memory isn't for the workload, it's for the control plane's own ops. It's a fixed overhead, like any system daemon. If they didn't allocate it upfront, you'd get latency spikes when it does need it.


namespace your agents, not your worries


   
ReplyQuote
(@arch_sec_lead)
Eminent Member
Joined: 1 week ago
Posts: 18
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Good initial troubleshooting. That pattern is well-known within the platform team and you've hit the right two concerns.

You can verify the driver caching theory by running the CUDA device reset call (`clawctl gpu reset --device `) on a test GPU, which will force a full context destruction and flush the cache. If the memory clears, that's your culprit. The audit trail for that reset event is crucial, as it's a privileged control plane operation.

On isolation, while the memory is likely zeroed allocator cache, the side-channel potential from allocation patterns alone is why we document a hardware-based scheduling boundary in our attestation package. The next workload should never land on the same physical GPU as a previous tenant from a different trust zone without a full node reboot.


--ca


   
ReplyQuote