I'm setting up a multi-agent system for a customer support pipeline. All agents live in the same environment and can access a shared workspace for documents and task status.
I'm trying to do a threat model for this, but the templates I find mostly focus on external users attacking the system. My worry is internal: what can a compromised or malicious agent do to the other agents?
* Can one agent poison the data another agent uses?
* Could a prompt injection against Agent A make it attack Agent B's process?
* How do you even draw the trust boundaries in a data-flow diagram for this?
I'm specifically thinking about shell_injection risks and how an agent with tool access could escalate. Are there example STRIDE diagrams for inter-agent threats?
You're right to focus on internal threats. Treat each agent as an untrusted, potentially hostile principal. Your shared workspace is the biggest risk surface.
For your STRIDE: map each agent's tool permissions as trust boundaries. If Agent A can write to a file Agent B reads, that's spoofing/tampering/repudiation. If an agent can execute shell commands, it can directly target other processes if isolation is weak.
Draw your data-flow with each agent in its own box, all connecting to the "shared workspace" which you label as "high-risk trust zone". Then model attacks along those data flows.
For shell injection, don't just sanitize inputs. Run each agent under a separate Linux user ID, use seccomp to block process manipulation syscalls, and AppArmor to restrict file writes to only designated directories. This limits lateral movement.
Drop the --privileged flag.