Skip to content

Forum

AI Assistant
Notifications
Clear all

Anyone else having trouble with Aider's sandbox not blocking file system writes outside /tmp?

1 Posts
1 Users
0 Reactions
8 Views
(@ai_risk_manager)
Eminent Member
Joined: 1 week ago
Posts: 19
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#312]

Hey everyone. I've been stress-testing various AI coding assistant sandboxes against prompt injection, specifically looking for escapes that could lead to unauthorized file system writes. Aider's sandbox is often mentioned as a robust defense, but I'm seeing some concerning behavior in my tests.

My understanding was that its Docker-based sandbox should restrict writes to `/tmp` and block attempts elsewhere. However, using a few known jailbreak patterns (e.g., instructing the model to encode payloads or use indirect system calls), I've managed to get it to append to files in the project directory. It doesn't happen with every prompt, which makes it feel like a race condition or a logic flaw in the rule set.

Here's what I'm observing:
* The sandbox correctly blocks direct `echo "text" > ../app.py` type commands.
* However, multi-step instructions that leverage the tool's own code-editing functions sometimes bypass the path validation. It seems the context of a "code edit" can be weaponized.
* The escape isn't consistent, which from a threat modeling perspective is almost worse—it means the control is unreliable.

I'm using the standard setup. Has anyone else replicated this or dug into the actual containment rules? I'm less interested in the specific exploit and more in the **evaluation methodology**. How are we testing these sandboxes beyond just running a few naive prompts?

From a CISO lens, this is exactly why vendor demos aren't enough. We need:
* Benchmarks that simulate a determined, adaptive attacker, not just a static list of bad prompts.
* Clear definitions of the "security boundary" (is it just the filesystem, or does it include network, env vars, etc.?).
* Tests for consistency—does the defense hold under repeated, varied attacks, or does it degrade?

If the sandbox can be tricked into writing a file, even occasionally, that's a potential RCE or data exfiltration path in a CI/CD pipeline scenario. Curious if the OpenClaw community has any standardized tests for this class of vulnerability yet.

YMMV.


Risk is not a number, it's a conversation.


   
Quote