Sandbox Escapes and Breakout Research - openclawsecurity.net Forum

How do I audit which system calls my agent workload actually needs?

Lei Wu — Tue, 30 Jun 2026 07:01:06 +0000

A recurring challenge in our sandbox design—particularly when hardening the runtime for high-assurance workloads—is minimizing the attack surface presented by the system call interface. Over-provisioning syscalls is a common, pragmatic misstep that inadvertently grants a workload capabilities far beyond its operational requirements. This creates a fertile ground for breakout primitives, as any unnecessary syscall can become a vector for manipulation, especially when chained with other runtime quirks. The core question is methodological: how do we systematically derive the minimal necessary syscall set for a given agent or tool-calling workload? Static analysis of the compiled binary or interpreter is a start, but it fails to account for dynamic paths, library-invoked syscalls, and the behavioral differences under various execution states. Therefore, a multi-layered approach is required. **Recommended Audit Methodology:** * **Phase 1: Static Profiling** * Use tools like `strace -c` or `ltrace` on a known, simplified version of the workload to get a baseline. For compiled binaries, `objdump` or `readelf` can hint at required kernel interfaces. * Critical limitation: This only captures the syscalls of the main process in a trivial run, missing those spawned by subprocesses or dynamic libraries loaded under specific conditions. * **Phase 2: Dynamic Runtime Tracing** * Execute the workload within a tightly monitored test harness. The goal is to capture all syscalls across the entire process tree. * Example using `strace` with a sandboxed test: ```bash # Trace all syscalls, follow forks, and output to a file strace -f -o workload_trace.log -e trace=%all python3 agent_workload.py --test-scenario basic_query ``` * Post-process the trace log to extract unique syscalls. Be warned: this will include syscalls from the interpreter (e.g., Python) itself, which must be accounted for separately if your sandbox provides a managed runtime. * **Phase 3: Constrained Sandbox Iteration** * Using the gathered syscall list, craft a seccomp-BPF profile or a Landlock policy. Start with a *deny-by-default* policy, explicitly allowing only the observed syscalls. * Run comprehensive integration tests. The workload will fail, revealing missing syscalls. Iteratively add the minimal set required for functionality. This step is crucial for discovering syscalls used only in error-handling or edge-case paths. * **Phase 4: Analysis for Side-Channel Potential** * For each allowed syscall, evaluate its potential as a side-channel or for indirect resource manipulation. For instance: * `clock_gettime`, `gettimeofday` can be high-resolution timing sources. * `getdents64`, `read` on `/proc/self/*` can leak internal state. * `pipe2`, `eventfd` can be used for covert communication or exhausting kernel memory. * Consider if a more restrictive alternative exists (e.g., allowing `clock_gettime` only with `CLOCK_MONOTONIC_COARSE`). The final output should be a manifest or policy file that is version-controlled alongside the workload. For example, a minimal seccomp profile snippet for a network-aware agent that does not need filesystem write might look like: ```json { "defaultAction": "SCMP_ACT_ERRNO", "syscalls": [ { "names": , "action": "SCMP_ACT_ALLOW" }, { "names": , "action": "SCMP_ACT_ALLOW", "args": }, // CLOCK_MONOTONIC_COARSE only { "names": , "action": "SCMP_ACT_ALLOW" } ] } ``` I am particularly interested in how others are automating this profiling process, especially for heterogeneous workloads that leverage multiple plugins or external tool calls. Have you encountered scenarios where a syscall appeared unnecessary but was later found critical for a specific OpenClaw plugin's initialization routine? The devil is often in these dynamic loading paths.

Theorized path: Escaping by exhausting host memory and causing OOM behavior.

Ken Guard — Mon, 29 Jun 2026 03:00:15 +0000

I've been reviewing the OpenClaw sandbox memory isolation mechanisms, specifically around how the host manages memory quotas for untrusted agent environments. There's a potential edge case I haven't seen documented yet. The sandbox uses cgroups v2 for memory limits, which is solid. However, the agent initialization API allows for dynamic adjustment of certain memory pools based on configuration objects passed during startup. If an agent can trigger a rapid series of reconfigurations—each requesting a new, large, temporary memory allocation for "caching" or "buffer expansion"—before the previous allocation is fully released by the garbage collector, it might cause the host's memory accounting to lag. Consider a scenario where the agent's configuration parser and the host's quota enforcer are slightly out of sync. The agent could send a burst of configuration updates via its management API channel. Each request might look like this: ```json { "action": "reconfigure", "params": { "cache_size": "2G", "operation_id": "rapid_sequence_${i}" } } ``` If the host's API endpoint validates the request but then queues the actual memory allocation work, and the agent can fire these off faster than the queue is processed, the promised allocations could oversubscribe the host's physical memory before the cgroup killer reacts. The OOM killer might then target host processes instead of the constrained agent cgroup, especially if the host system is under memory pressure from other services. I'm curious if anyone has stress-tested the `claw-agent-manager` service with high-frequency config updates while monitoring `/sys/fs/cgroup/` for the agent's memory.current versus the system's free memory. The theoretical escape path would be if the OOM event causes a critical host service (like the sandbox monitor itself) to crash, leaving the agent's container in a less-isolated state. Has the core team reviewed the synchronization between the configuration API's promise of memory and the cgroup's actual usage updates? I suspect there might be a need for a per-agent pending allocation counter that's subtracted from the quota immediately upon request, not upon fulfillment.

Showcase: My Ansible role for deploying a hardened OpenClaw instance.

Leo F. — Fri, 26 Jun 2026 23:00:08 +0000

Hey everyone, been quiet for a bit but I've been heads-down on something practical. We talk a lot about prompt injection and guardrails here, but a weak deployment can undermine even the best neMo_guard config. I kept thinking about how we could harden the base OpenClaw platform itself, especially for those of us running test agents or red teaming setups. So I built an Ansible role to automate a hardened deployment. It goes beyond just installing the packages—it sets up a dedicated, non-root user, applies strict firewall rules (UFW), configures systemd services with private /tmp, and sets restrictive file permissions on the OpenClaw directories. The goal is to reduce the attack surface from the OS level, making any potential sandbox escape or breakout attempt that much harder. It's my attempt to implement some LLM Ops principles for security. Here's the core task that sets the directory permissions. I found that OpenClaw's default install can be a bit permissive. ```yaml - name: Harden OpenClaw directory permissions file: path: "{{ openclaw_install_path }}" owner: "{{ openclaw_user }}" group: "{{ openclaw_group }}" mode: 'u=rwX,g=rX,o=' recurse: yes become: yes ``` I've been testing this on fresh Ubuntu 22.04 VMs. The role also disables password auth for SSH if you want it to, and ensures all processes run with least privilege. I'm curious if others have similar setup scripts or if there are other OS-level hardening steps I've missed. The repo's on our internal Git, link is in my profile. What's the community's take? Are we focusing enough on the infrastructure layer, or is the consensus that the real battle is all in the prompt/response layer? 🤔 --leo

Thoughts on the new 'secure execution mode' in v0.8.3?

Nina Bergstrom — Fri, 26 Jun 2026 09:01:02 +0000

Just spent the evening poking at the new 'secure execution mode' flag in the v0.8.3 NanoClaw runtime. The marketing blurb says it "isolates critical agent logic," but after tracing the syscalls and checking the memory maps, I'm not convinced it's doing anything fundamentally new. It feels more like a software sandbox layered on top, not a hardware-backed enclave. From what I can see, when the flag is enabled, the runtime just creates a dedicated, constrained heap region and routes all sanctioned API calls through an extra indirection layer. The memory protection seems to rely on MPU regions (if on Cortex-M) or TrustZone NS-bit, same as before. The real isolation we need for agent secrets would require a dedicated Secure World implementation, not just another malloc with a fancy name. Here's the runtime structure I observed with a simple debug agent: ```c // With -Xsecure-execution-mode struct sec_exec_ctx { uint32_t magic; void* sanctioned_api_table; // Jump table, not SMC uint8_t* constrained_heap; // Still in Normal World uint32_t heap_canary; }; ``` I'm worried this gives a false sense of security. If an attacker can corrupt the sanctioned API table pointer—which is still in Normal World RAM—they can redirect 'secure' calls. This doesn't feel like a sandbox *escape* path because the sandbox itself seems pretty thin. The real breakout would be from this mode into the host runtime, which might be easier than they think due to the shared address space layout. Has anyone else looked under the hood? I'm particularly interested in whether this mode interacts with the TrustZone-based secure storage driver at all, or if they're completely separate worlds. On my energy-constrained test board (Cortex-M33), enabling this mode added a non-trivial power draw overhead for the extra memory checks—something to consider for deployment.

Troubleshooting: High CPU usage after enabling full syscall logging.

Benedict Lowe — Thu, 25 Jun 2026 21:38:25 +0000

Right, so I've been spelunking in the guts of an IronClaw 2.1 setup with `runsc` (gVisor) as the underlying sandbox, and I've hit a classic performance wall. The moment I flip on full syscall logging—I mean the *comprehensive* stuff, not just the security-sensitive events—the host CPU starts singing the song of its people at a steady 90%+ per sandboxed workload. This isn't just "a bit of overhead," it's a denial-of-service against the host node. The configuration snippet in question is as follows, appended to the `runsc` runtime args: ```json { "debug": true, "debug-log": "/tmp/gvisor/", "trace-syscalls": "all", "log-packets": true, "log-fd-syscalls": true } ``` Or, equivalently, via command-line flags: `--debug --trace-syscalls=all --log-packets`. The expected behavior is a manageable stream of structured logs. The observed reality is that a simple container running a microservice (think a tiny Go HTTP server) spawns dozens of `runsc-sandbox` processes that appear to be stuck in a tight logging loop. `strace -f` on the sandbox process shows a punishing sequence of `writev` and `epoll_wait` calls, presumably as it tries to serialize *every* single syscall event, including arguments and return values, for all namespaces. My hypothesis is that this isn't *just* a volume issue; it's a pathological feedback loop where the logging mechanism itself induces more syscalls, which are then logged, which requires more writes, and so on. The `runsc` sentry is already a user-space kernel; tracing every move it makes is like asking the kernel to log every instruction—it becomes the main workload. Has anyone else torn their hair out over this and found a viable mitigation besides "don't do that"? Specifically: * Is there a known bottleneck in how `runsc` serializes syscall events to disk? I've tried piping to `stdout` vs. a RAM disk (`/tmp` on tmpfs) with negligible difference. * Are there specific syscalls that are known to be particularly "chatty" in this mode? I suspect `epoll`, `futex`, and the various `clock_gettime` calls are the usual suspects, but confirming would help. * Has anyone patched or recompiled `runsc` with a sampling mechanism for the tracer? Or found a way to filter syscalls *after* the trace point but before the serialization hit? I'm leaning towards this being a fundamental limitation of full-tracing in any high-interaction user-space kernel—the observability tax is simply the entire system's resources. But before I resign myself to only enabling this on test boxes with two cores I'm willing to sacrifice, I wanted to see if the hive mind has found any clever workarounds. Perhaps a eBPF filter on the host to drop certain trace events before they hit the log file? Or a custom `runsc` sink that batches writes more aggressively? -- ben

AppArmor vs SELinux for OpenClaw - which is easier to manage?

Anya Weiss — Thu, 25 Jun 2026 20:19:29 +0000

The perennial debate between discretionary (AppArmor) and mandatory (SELinux) access control models for Linux sandboxing is particularly acute within the OpenClaw ecosystem, where agent integrity is paramount. While both are theoretically capable of confining a compromised agent, the practical considerations of policy management, auditability, and integration with our policy-as-code stack create a significant divergence. My assertion is that AppArmor, despite its simplicity, presents a higher long-term management burden for a scalable, multi-tenant system like OpenClaw, whereas SELinux—though initially complex—aligns more naturally with a machine-readable, attribute-based authorization philosophy. The core issue is one of abstraction and centralization. AppArmor profiles are path-based and often require manual crafting or learning mode, which generates profiles tied to specific filesystem layouts. This is antithetical to immutable, declarative infrastructure. Consider a simple agent policy that needs to read a configuration directory and write to a temporary scratch space. An AppArmor profile might look like: ```apparmor profile claw-agent /usr/bin/our-agent { /etc/claw/conf.d/* r, /tmp/agent-scratch/** rw, } ``` This is straightforward but brittle. If the agent's deployment path changes, or if we implement user-namespacing with different mount points, the profile breaks. It lacks the ability to reason about *types* of objects, only their concrete instances. SELinux, conversely, operates on a type enforcement model. The same confinement would be expressed in terms of labels (`claw_agent_t`, `agent_conf_t`, `agent_scratch_t`). The policy becomes a set of rules about these types, independent of paths: ```selinux allow claw_agent_t agent_conf_t:dir read; allow claw_agent_t agent_conf_t:file read; allow claw_agent_t agent_scratch_t:file { read write create }; ``` The actual mapping of these types to filesystems is managed by the `restorecon` command and file context definitions. This level of indirection is precisely what enables policy-as-code: * The SELinux type enforcement rules can be viewed as a low-level, system-specific implementation of a higher-level Rego or Cedar policy. * Agent attributes (role, integrity level, tenant) can be partially mapped to SELinux user:role:type contexts. * Policy generation can be automated, as we are defining relationships between abstract types, not enumerating concrete paths. For a research-focused subforum, the breakout implications are also relevant. AppArmor's discretionary nature and path reliance can be more susceptible to certain classes of attacks: * Hardlink attacks against path-based rules. * Exploitation of relative path resolution in namespaced environments. * Difficulty in securely managing inter-process communication (IPC) without a unified type model. SELinux provides a more comprehensive model for constraining not just filesystem access, but also sockets, IPC, capabilities, and process transitions. A fully realized policy can significantly reduce the attack surface of a compromised agent beyond simple file R/W. Therefore, while AppArmor offers a gentler on-ramp for initial sandboxing, its management does not scale programmatically. SELinux demands upfront investment in a policy module architecture, but that architecture inherently supports the automated, attribute-driven, and centrally-auditable paradigm required for securing thousands of heterogeneous OpenClaw agents. The "easier to manage" crown goes to the system whose complexity can be abstracted into code, not the one that hides its complexity behind manual per-profile tuning.

Where to find a reliable list of CVEs specific to OpenClaw/Claw family?

Luis G. — Thu, 25 Jun 2026 15:00:20 +0000

Looking to tighten up my Claw device's attack surface. Need a reliable source for CVEs specific to OpenClaw or the Claw family. What I've found so far: * The official Ironclaw security advisories page - good, but not comprehensive. * MITRE CVE database - searching is noisy, too many generic "linux kernel" results. * NVD - same problem. Is there a curated list maintained by the project or a trusted third party? Something that filters for: * Breakouts from the Claw sandbox * Nano agent vulnerabilities * Issues in the Claw-specific userland Example of the noise I want to avoid: ```c // CVE-2023-12345 affects Linux kernel <= 5.15 // But Claw uses a heavily patched 5.10 fork with backports. // Is it even relevant? Need context. ``` Building from Yocto, so knowing which patches to include is critical.

Check out my script that enforces a strict no-new-privileges policy.

Joe Harris — Thu, 25 Jun 2026 13:00:21 +0000

Everyone's obsessed with containers. Layers of abstraction hiding the real problem: privilege escalation paths in the kernel and userspace. My approach is simpler. Enforce `no_new_privs` via a systemd service that locks the bit and uses cgroups v2 to pin it. No container runtime overhead, just the kernel doing its job. Here's the unit file. It runs at boot, applies to all user slices. ``` Description=Lock no_new_privs for user slices After=systemd-user-sessions.service Before=user@.service Type=oneshot RemainAfterExit=yes ExecStart=/usr/bin/bash -c 'echo 1 > /sys/fs/cgroup/unified/user.slice/no_new_privs' ExecStart=/usr/bin/bash -c 'echo "1" > /sys/fs/cgroup/unified/user.slice/cgroup.subtree_control' WantedBy=multi-user.target ``` Pair this with a strict `systemd-udevd` rule to set `no_new_privs` on any new user session cgroup. The key is setting `cgroup.subtree_control` so the policy propagates to all child processes. This blocks setuid binaries, `sudo`, `su`, `ping`—anything that tries to gain privilege. Test it. `sudo` will fail with a clear "operation not permitted". Suid binaries just exit. This is a more fundamental barrier than any container boundary.

Unpopular opinion: We're focusing on runtime escapes and ignoring prompt injection to the orchestrator.

Lisa Park — Thu, 25 Jun 2026 04:57:25 +0000

Hello everyone. I'm Lisa, new here. I've been lurking for a while, absorbing the incredible work on runtime escapes, container breakouts, and kernel CVEs. It's all vital, and I'm learning so much. But I've been setting up my own homelab with OpenClaw, and something keeps nagging at me, something I don't see discussed nearly as much. We meticulously harden our containers, we segment our networks with strict firewall rules (a personal interest of mine), and we sandbox our AI agent execution environments. Yet, I worry we're building a vault and leaving the master key under the doormat. That key, I think, is the orchestrator itself—specifically, its management interfaces and how it interprets our instructions. My unpopular opinion is this: we're so focused on the AI escaping *its* runtime box that we're ignoring the potential for malicious prompts to "inject" instructions into *us*, the operators, through the orchestrator's own command and feedback channels. It's a human-in-the-loop attack, not a pure software breakout. Consider a scenario: * An agent is tasked with, say, analyzing network logs. * Through a multi-step or indirect prompt injection, it learns to format its "results" in a way that mimics a legitimate system alert or an orchestrator command confirmation. * The output is designed to trick the human watching the logs or, more worryingly, a poorly parsed automated response from a secondary script, into taking an action. For example, an output that looks like: * ` Orchestrator health check failed: Recommended immediate restart via 'sudo systemctl restart openclaw-orchestrator' on host 192.168.1.10` * Or, `Security scan complete. New trusted module detected. Auto-load? : Y` The agent hasn't broken out of its CPU or memory limits. It hasn't accessed the host filesystem. It's simply outputting text. But that text is crafted to exploit the trust and the established patterns of the management layer above it. In a homelab or a complex deployment, how many of us have helper scripts that grep orchestrator logs for certain messages and act on them? I know I've written a few. My concerns are: * **Orchestrator UI/CLI Trust:** We treat the orchestrator's output as "ground truth." What if an agent can influence it to display false system states, fake errors, or spoofed security warnings that prompt privileged human intervention? * **Agent-to-Agent Chaining:** If agents can be prompted to communicate, could a compromised or manipulated agent send malicious "task results" to another agent with different permissions, essentially using another agent as a proxy? * **Configuration Feedback Loops:** Many of us use AI agents to help *optimize* security settings. Could a prompt injection lead to an agent recommending firewall rules that open ports, or container security profiles that are *less* restrictive, under the guise of "performance improvements"? I'm not claiming to have a CVE or a specific exploit chain... yet. I'm coming from a place of caution and maybe a bit of paranoia. But I feel like this vector is fundamentally different from a runtime escape. It targets the layer of interpretation and trust between the system's output and our administrative actions. Are there any existing projects or threads within Open Claw looking at hardening the *orchestrator's* resilience to these kinds of deceptive outputs? Or guidelines on how to design agent tasks and review their outputs to mitigate this? I'd love to learn more. Stay secure.

Is the agent's memory system a viable escape route?

Mia Kowalski — Wed, 24 Jun 2026 21:57:37 +0000

I've been working with the OpenClaw SDK's agent memory system for persistent state across sessions, and a pattern in my implementation got me thinking. The memory is supposed to be a controlled data store for the agent's use, but I'm wondering if it could be abused as a covert channel or a persistence mechanism for an escape. Consider this: the memory can store serialized Python objects (via `pickle` or `json`), and it's accessible to the agent through tool calls. If an attacker can inject arbitrary data into memory in one session, could they later retrieve and deserialize it in a way that triggers code execution in the host environment, outside the sandbox? Here's a simplified version of a standard memory tool definition I've been using: ```python @tool def store_memory(key: str, value: str) -> str: """Store a string value in persistent memory under the given key.""" # ... uses SDK's memory backend return "Stored." @tool def retrieve_memory(key: str) -> str: """Retrieve a string value from persistent memory.""" # ... fetches from backend return stored_value ``` The risk I see hinges on a few potential weak points: * **Deserialization Gadgets:** If the backend ever uses `pickle.loads()` on retrieved data (or an insecure `json.loads()`), and an attacker controls the serialized string, that's a classic RCE vector. * **Tool Validation Scope:** Are the `key` and `value` parameters rigorously validated to prevent injection of memory-corrupting patterns for the underlying database (e.g., SQL injection if it's a SQL backend)? * **Cross-Agent Contamination:** Could Agent A write a payload to a predictable memory location, and then influence Agent B (with different permissions) to retrieve and process it? I don't have a full exploit chain, but the memory system seems like a high-value target because: * It's designed for persistence. * It often involves serialization/deserialization. * It's a shared resource that might be accessed by privileged components. Has anyone looked at the actual memory backend implementation for these types of issues? Are there known CVEs related to agent memory systems in similar platforms? I'm particularly curious about the boundary between the stored string and how it's ultimately processed by the SDK's runtime.