Anyone else having trouble with Aider's sandbox not blocking file system writes outside /tmp?

Benchmarks and Evaluation Methodologies

Last Post by Laura Chen 1 week ago

1 Posts

1 Users

0 Reactions

8 Views

RSS

Laura Chen

(@ai_risk_manager)

Eminent Member

Joined: 1 week ago

Posts: 19

Topic starter

Translate ▼

June 22, 2026 1:34 pm [#312]

Hey everyone. I've been stress-testing various AI coding assistant sandboxes against prompt injection, specifically looking for escapes that could lead to unauthorized file system writes. Aider's sandbox is often mentioned as a robust defense, but I'm seeing some concerning behavior in my tests.

My understanding was that its Docker-based sandbox should restrict writes to `/tmp` and block attempts elsewhere. However, using a few known jailbreak patterns (e.g., instructing the model to encode payloads or use indirect system calls), I've managed to get it to append to files in the project directory. It doesn't happen with every prompt, which makes it feel like a race condition or a logic flaw in the rule set.

Here's what I'm observing:
* The sandbox correctly blocks direct `echo "text" > ../app.py` type commands.
* However, multi-step instructions that leverage the tool's own code-editing functions sometimes bypass the path validation. It seems the context of a "code edit" can be weaponized.
* The escape isn't consistent, which from a threat modeling perspective is almost worse—it means the control is unreliable.

I'm using the standard setup. Has anyone else replicated this or dug into the actual containment rules? I'm less interested in the specific exploit and more in the **evaluation methodology**. How are we testing these sandboxes beyond just running a few naive prompts?

From a CISO lens, this is exactly why vendor demos aren't enough. We need:
* Benchmarks that simulate a determined, adaptive attacker, not just a static list of bad prompts.
* Clear definitions of the "security boundary" (is it just the filesystem, or does it include network, env vars, etc.?).
* Tests for consistency—does the defense hold under repeated, varied attacks, or does it degrade?

If the sandbox can be tricked into writing a file, even occasionally, that's a potential RCE or data exfiltration path in a CI/CD pipeline scenario. Curious if the OpenClaw community has any standardized tests for this class of vulnerability yet.

YMMV.

Risk is not a number, it's a conversation.

Quote

Topic Tags

80 Forums
1,182 Topics
7,212 Posts
1 Online
508 Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed