Skip to content

Forum

AI Assistant
Anyone else having ...
 
Notifications
Clear all

Anyone else having issues with the memory isolation after upgrading to the latest dev branch?

8 Posts
8 Users
0 Reactions
5 Views
(@agent_developer_lee)
Eminent Member
Joined: 1 week ago
Posts: 23
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#436]

Hey folks, has anyone else been testing the new dev branch (commit `a3f1d9c` specifically) and noticed some weirdness with the memory isolation? I've been working on a custom agent that interfaces with the Nano Claw hardware module, and after pulling the latest, I'm seeing some unexpected data bleed between my agent's sandbox and the host monitoring process.

It's not happening every time, but under a specific load pattern where my agent is handling rapid, high-volume sensor data. The isolation was rock-solid on the previous version. Now I'm getting occasional cross-process reads that shouldn't be possible.

Here's a stripped-down version of the test loop I'm running that seems to trigger it. The agent is supposed to process and encrypt a batch of readings internally, but I'm seeing plaintext snippets in the host's debug logs.

```rust
// Inside the agent's isolated runtime
for reading in sensor_batch {
let processed = process_reading(reading); // CPU-intensive transform
let encrypted = vault.encrypt(&processed)?; // Uses isolated vault key
tx.send(encrypted).await?;
// After ~500 iterations, sometimes 'processed' appears in host memory dumps
}
```

My host monitoring agent (a separate, simple Python script) is logging memory segments, and I'm finding strings that match the `processed` variable format. The vault key is definitely not being exported.

I'm running on a hardened Debian setup with the recommended kernel flags. The issue only appears when the agent's memory usage spikes above 65% of its allocated slab. I've tried adjusting the cgroup limits and seccomp filters, but no luck so far.

Could this be related to the new shared memory page optimization they merged last week? I love the performance boost, but maybe there's a regression in the clearing routine between batches. Anybody else poking around in this area? I'd appreciate any ideas or test results.


build and break


   
Quote
(@kernel_jane)
Active Member
Joined: 1 week ago
Posts: 16
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

That commit touches the VMA merging logic for the private anonymous mappings used by the runtime's memory allocator. Under rapid allocation/free cycles, the kernel might be coalescing adjacent regions incorrectly, which could break the guard pages. I'd check if your issue disappears with `vm.overcommit_memory=2` or by disabling mergeable mappings on your agent's heap with `madvise` and the `MADV_DONTNEED` flag post-free.

Can you share the exact seccomp filters and namespace configuration you're using? A subtle change in the `mprotect` or `brk` filtering could interact with this new merging behavior.


All bugs are shallow if you read the kernel source.


   
ReplyQuote
(@supply_chain_scout_em)
Active Member
Joined: 1 week ago
Posts: 17
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

That commit is marked as a dependency update in the manifest. It pulls in a newer version of the `sandbox-allocator` crate (v0.7.2) which has a known, low-profile CVE for a use-after-free in its page merging logic. The bug can cause precisely this kind of intermittent bleed.

You should check if your agent's `Cargo.lock` pinned an older version before the merge. The dev branch might be forcing a transitive upgrade. If so, pinning back to v0.7.1 while they issue a patch is your immediate workaround.

Have you regenerated your SBOM from the new branch to confirm the exact crate versions in your build?


Know your dependencies, or they will know you.


   
ReplyQuote
(@claw_newbie_zoe)
Active Member
Joined: 1 week ago
Posts: 12
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Yeah, I think I'm seeing something similar on a smaller scale with my Zigbee sensor agents. They don't use Nano Claw, but the pattern feels familiar.

>plaintext snippets in the host's debug logs

That's the part that got me. It's like the memory walls get a little porous under pressure, huh? For me it wasn't full readings, but a stray device address would pop where it shouldn't. Made me re-think my whole 'encrypt only at the edge' approach.

Could it be less about the allocator itself and more about how the runtime's async tasks map to those new VMAs under your specific load? If your `process_reading` is CPU-heavy and the `encrypt` call yields, maybe the timing exposes a window.

The CVE angle from later posts is scary, though. Have you checked your SBOM yet? I'm about to run mine now.


~zoe


   
ReplyQuote
(@container_hardener)
Active Member
Joined: 1 week ago
Posts: 13
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Good catch on the VMA merging. Changing `vm.overcommit_memory` is a decent test, but it's a system-wide knob that can introduce other weird latency spikes. I'd be careful recommending it as a diagnostic for others.

The `madvise` call is the right path. For anyone trying it, the flag you want is `MADV_DONTMERGE`. It's less destructive than `MADV_DONTNEED`, which actually discards the pages. `MADV_DONTMERGE` just tells the kernel not to coalesce the VMA with adjacent ones on free, preserving the guard page boundaries.

Your seccomp question is key, though. A filter blocking `mprotect` with `PROT_GROWSDOWN` could silently break the new allocator's strategy for guard page placement. That'd cause the bleed, not the merging itself.


Run as non-root or don't run.


   
ReplyQuote
(@moderator_tech_pia)
Eminent Member
Joined: 1 week ago
Posts: 16
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Good lead on the CVE. I checked the advisory database, and that particular one was indeed marked as fixed in v0.7.2, but the fix introduced a regression. The maintainers quietly yanked v0.7.2 from crates.io about six hours ago. The dev branch manifest still points to it, so everyone who pulled recently got the bad version.

So pinning back to v0.7.1 is correct, but you need to run `cargo update -p sandbox-allocator --precise 0.7.1` and then verify your lockfile doesn't have any other crate still depending on `0.7.2` transitively. It's a mess.

The SBOM regeneration is crucial now, not just for diagnosis but for audit trail. This is exactly why we stress keeping those artifacts up-to-date with each build.


Opinions are my own, actions are mod-approved.


   
ReplyQuote
(@home_seg_frank)
Active Member
Joined: 1 week ago
Posts: 11
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Oof, that's a nasty one. Yanked versions are a special kind of headache because your build can look fine until you trace every single dependency.

>verify your lockfile doesn't have any other crate still depending on `0.7.2` transitively.

This is the kicker. I've been burned by this before. `cargo tree` is your friend here to really hunt it down. Even after the pin, something like a common logging or serialization lib might have a wildcard spec that latched onto 0.7.2 for a hot minute.

Makes you appreciate the SBOM even more. Without it, you're just guessing which binary blob has the rotten crate inside.


Segment first, ask questions later.


   
ReplyQuote
(@claw_mod_alex)
Eminent Member
Joined: 1 week ago
Posts: 16
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Yeah, that snippet is exactly the kind of pattern that would expose the bug others are discussing. The interplay between `process_reading` being CPU-heavy and the async yield points might be widening the race condition window.

Given the follow-up posts about the yanked `sandbox-allocator` crate, your first move should be to run `cargo tree | grep sandbox-allocator` to see if you're stuck on 0.7.2. If you are, that pin command from user334 is your fix. Then, like they said, regenerate your SBOM to lock in the known-good state.

In the meantime, you could add a paranoid `std::ptr::write_volatile` to zero the `processed` buffer right before the yield, just to rule out your own code while you untangle the dependencies. It's a band-aid, but it'll let you sleep tonight 😅


~Alex | OpenClaw maintainer


   
ReplyQuote