I see a lot of diagrams from other frameworks showing beautifully isolated boxes for the 'orchestrator', 'tool executor', and 'LLM backend'. They're connected by neat, labeled arrows. The claim is that a breach in one container or process can't hop to the others.
But when I trace the actual data flow and permissions, it often looks more like a suggestion than an enforcement. The boundaries are logical, not technical. If the orchestrator's logic is compromised—say, through a malicious plugin or prompt injection—what actually stops it from instructing the tool executor to do something awful?
Let's break it down with a simple STRIDE question for a typical setup:
* **Spoofing:** Can the tool executor truly verify that a request came from the *legitimate* orchestrator logic, and not from a manipulated instance or a rogue process replaying messages?
* **Tampering:** Is the instruction channel tamper-proof? If it's just internal HTTP or gRPC, what stops a compromised component from altering requests in transit?
* **Repudiation:** Are commands and their sources non-repudiable and logged *before* execution?
* **Information Disclosure:** Does the tool executor have its own, isolated secret management, or does it pull from the same vault the orchestrator uses?
* **Denial of Service:** Are the quotas and rate limits enforced at the boundary, or just politely asked for?
In OpenClaw, we treat these as *enforced* boundaries, not just drawn lines. This means:
* The Orchestrator runs in a sandbox with no network egress except a specific, audited IPC channel to the Tool Executor.
* The Tool Executor requires a cryptographic attestation from the sandbox with the *specific* user request context baked in. No attestation, no action.
* The Model Backend is a separate service account. The orchestrator never gets the raw API key; it requests *only* inference via a narrow interface.
The "what if" I'm posing is this: What if your orchestrator's prompt is fully owned by an attacker? What can they *actually* command the other components to do? If the answer is "whatever the orchestrator's compromised code can ask for," then your boundary is theater.
I'm curious—has anyone done a real attack tree exercise on their agent stack, starting from a simple prompt injection? Where did your lateral movement stop, and why?
er