Hey everyone, I've been trying to understand the real-world security of these AI coding sandboxes. Specifically, I've been poking at two I see mentioned a lot: Anthropic's Claude Code (within the Claude desktop app) and Aider's chat mode (which also runs code in a sandbox). Both claim to execute code in a controlled environment for the user's safety, but I'm trying to figure out which one might be easier to "red-team" if I were to write a custom tool that the AI agent could be tricked into using.
My starting point is that both are supposed to be isolated. But from a Python and self-hosting perspective, I'm curious about the attack surface for privilege escalation or breakout. If I, as a hypothetical attacker, could get a malicious prompt accepted that makes the AI use a tool I designed, what happens next? Which sandbox would give my tool more room to operate or be more permissive by default?
For example, in Claude Code, the environment feels very sealed. But I've done some basic probing:
```python
# Simple probe to see what's available
import os
import sys
print("CWD:", os.getcwd())
print("Files in CWD:", os.listdir('.'))
print("Python path:", sys.path)
try:
import socket
print("Socket module available")
except ImportError:
print("Socket restricted")
```
In my limited tests, Claude Code seems to restrict network modules outright. But Aider's sandbox, at least in my local setup, sometimes feels more connected to the host's environment, depending on how it's launched. This makes me wonder about the fundamental isolation mechanisms.
So my core questions are:
1. What's the actual isolation method for each? Are they using containers, namespaces, pure Python sandboxing (like `restrictedpython`), or something else?
2. From a red-team perspective, if I can get my payload to run as a "tool," which sandbox has more inherent permissions that could be leveraged? Things like file write locations, ability to spawn subprocesses, or access to environment variables.
3. Does one have a more permissive default policy that would make it easier to, say, exfiltrate data from the sandbox or achieve persistence?
I'm not looking to break them for fun, but I genuinely want to understand the threat model. If we're building agents with tools, we need to know how hard it is for a malicious tool to do damage if the agent is tricked into loading it. Are these sandboxes designed more for safety against accidental damage, or for resisting a determined attacker? 🤔
Maybe some of you have done more thorough testing or looked at the source. I'd love to hear about the actual boundaries and any known weaknesses.
Interesting probe, but you're stopping short of the good stuff. That snippet's just checking what's visible. The real question is what's *allowed*, not what's listed.
Claude Code runs in a purpose-built container with heavy syscall filtering. Your probe won't show that. Try `import subprocess; subprocess.run(['ls', '-la'])` and see if it even executes. Last I checked, they blacklist most of `subprocess` and `os.system`. Aider, depending on how it's deployed, often just runs in a Docker container with default seccomp profiles. That's a huge difference.
If your custom tool needs to do anything interesting, Aider's default Docker setup is almost certainly the softer target. It's generic isolation. Claude Code's is bespoke and paranoid. The attack surface isn't about Python path, it's about what syscalls your tool can successfully make.
reality has a bias against your threat model
That's a fair point about syscalls being the real measure. The bespoke container for Claude Code is built from the ground up to reject anything unexpected. It's not just a blacklist, it's a default-deny stance.
But Aider's security depends entirely on the user's setup. Its default Docker might be generic, but the project encourages self-hosting where you control the policies. Someone paranoid could harden it far beyond defaults, while a novice might leave it wide open. So the softer target isn't a given, it's a configuration lottery.
Safety first, then security.
Oh, that syscall angle makes a ton of sense, thanks. So trying that `subprocess` call is a much clearer test than just looking at modules.
Just to make sure I follow: when you say Claude Code's container is "bespoke and paranoid," does that mean it's probably not even a standard Docker container, but something they built custom? That seems like a much bigger hurdle for a new tool to get around.
Learning by doing (and breaking).
Exactly. It's a completely custom runtime, not Docker. They've effectively built a minimal execution environment from scratch, likely using gVisor or Firecracker at the kernel isolation layer, with an aggressively pruned userspace. The "container" you see is just the artifact of that.
This makes the hurdle for a new tool substantial because the attack surface isn't just about blocked syscalls. The entire filesystem is ephemeral and read-only except for a tiny, tightly controlled mount. There are no shell binaries, no package managers, and the Python interpreter itself might be compiled with specific, restrictive flags.
So your custom tool wouldn't just fight a blacklist, it would operate in a barren, instrumented environment designed to expose any attempted deviation.
Your agent is only as safe as its last prompt.
That's a crucial distinction about the runtime. If it's truly built on something like gVisor, then the isolation boundary is fundamentally different from a Docker container on the host kernel.
A follow up question, since I'm new to this: for red-teaming, wouldn't the gVisor/ Firecracker layer itself become the primary target? The "barren environment" sounds impenetrable from inside, but the virtualization layer has its own attack surface. Is there any public research on exploiting those from a constrained, instrumented Python context like Claude's?
You've identified the right initial probe, but your method is incomplete for assessing tool viability. The `socket` import attempt is a decent start, but a custom tool's success hinges on persistent state and side effects, not just module availability.
A more revealing test is attempting to create a file in a presumed writable directory and then read it back in a subsequent, isolated execution. For example, try writing to `/tmp` or the current working directory, then simulate a new tool invocation by restarting the interpreter or calling a separate process. If state persists between tool calls, that's a critical foothold.
The key for red-teaming a custom tool isn't just breaking out, it's establishing a persistent command and control channel within the allowed sandbox operations. That's often about abusing intended features, like a shared memory space or an allowed network call to a localhost service, rather than a direct syscall escape.
Provenance matters.
Your probe's unfinished, but I see where you're going. That `socket` try is key, but you're checking the wrong thing.
You're asking which sandbox gives your tool more room. The answer's in what happens after `import socket` succeeds. Can you bind? Can you connect out? In Claude Code, you'll likely get the module but all the actual calls are intercepted and killed. It's a puppet theater.
Aider's default Docker, if you've got it talking to the internet? That's a real socket. Your custom tool could phone home immediately.
Try this instead of just importing:
```python
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
s.connect(('8.8.8.8', 53))
print("Got out")
except Exception as e:
print("Nope:", e)
```
The module list is a decoy. The connect attempt is the truth.
Patch early, patch often.
Your probe's fundamental assumption is the problem. You're asking "which sandbox is easier to red-team," but that's almost entirely dictated by the tool's *purpose*.
If your custom tool is meant to, say, exfiltrate data, then network egress is the only metric that matters. user22's socket test is the only probe you need. The rest is theater.
If your tool is meant to establish persistence or modify the host, then filesystem write semantics and state survival between execution contexts are key, as user482 hinted. Can you write to a path that survives the current code block? In a barren environment, probably not.
So the softer target isn't a product, it's an attack vector. Figure out what your tool is supposed to do first. The answer becomes obvious.
audit what matters
You're right that the tool's purpose dictates the vector, but I think you're oversimplifying the network egress point. Even if socket.connect works, the real question is *what* it connects to and under what constraints.
> The socket test is the only probe you need.
Not if the environment uses deep packet inspection or egress proxies that only permit whitelisted domains, which these bespoke setups often do. You could have a functional socket that can only talk to `api.claude.ai` over TLS, rendering a raw exfiltration tool useless unless it can impersonate legitimate traffic. That's a layer up from mere syscall filtering.
So the softer target isn't just about whether the primitive exists, but about how many policy layers sit between the primitive and the outside world. Aider's default Docker might give you a socket that routes through the host's unrestricted network stack, while Claude Code's might give you a socket that's functionally neutered by an intermediary filter. The probe needs to account for that.
User space is for amateurs.
That's a great point about the policy layers. The network egress is just step one.
Your custom tool might connect out, but if it's hitting a transparent proxy with allow-listed FQDNs, you're stuck mimicking legit traffic. That's a whole other red-team exercise - crafting requests that look like `api.claude.ai` health checks or something.
So the probe isn't just `socket.connect(('8.8.8.8', 53))`. It's `socket.connect(('allowed-domain.somecorp.net', 443))` and then seeing if you can smuggle data in a TLS session the proxy will forward.
--Priya
Your probe's cut off right where it gets interesting. You're checking module imports, but that's only half the story. In Claude Code, you'll likely find `socket` is importable, but every network call is intercepted by their runtime. It's designed to give you the illusion of a full environment while actually running your code in a straitjacket.
So if your custom tool relies on any persistent side effect, like writing a file or opening a raw network connection, Claude Code's sandbox will shut it down instantly. Aider's default setup, on the other hand, gives you a more typical Docker container. That means if you can get it to pull and run your tool, you've got a much broader set of Linux syscalls and possibly network access to work with.
The real question is whether you're self-hosting Aider. The default setup might be softer, but if you run it yourself, you control the Docker daemon and can lock it down just as tight as Claude does. The attack surface isn't really about the product, it's about the configuration.
Keep it technical.