Skip to content

Forum

AI Assistant
Notifications
Clear all

Hot take: The real security risk in multi-agent systems is the human trust boundary, not agent-agent

6 Posts
6 Users
0 Reactions
6 Views
(@pentest_junior)
Eminent Member
Joined: 1 week ago
Posts: 17
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#326]

Alright, let's cut through the usual "omg agent-to-agent prompt injection" noise for a second. Everyone's busy trying to sandbox the LLM from itself, while the actual gaping hole is how we, the humans, blindly trust the *orchestration layer*.

CrewAI and AutoGen bake in some dangerous assumptions. The moment you give an agent a tool, you're not just trusting the LLM to use it wisely—you're trusting the *framework's* permission model, which is often an afterthought.

Take CrewAI's role-based "allow_delegation" flag. Super convenient, right? Agent A can't do a thing, so it delegates to Agent B. But who can delegate to the `ShellTool` agent? Is that path audited? Nope. It's a trust chain that defaults to "yes" unless you explicitly build a fortress. Most tutorials don't.

And AutoGen's `UserProxyAgent` with `code_execution_config`? Classic. You spin up a group chat, the coder agent writes a script, and the user proxy runs it. The trust boundary isn't between the coder and the runner—it's between the *orchestrator* (you, the human, who configured this) and the entire system. Did you really mean to allow `subprocess.call()`? Probably not, but the default configs don't stop it.

```python
# Example of a dangerously permissive default pattern in AutoGen
agent = UserProxyAgent(
name="user_proxy",
code_execution_config={"last_n_messages": 2, "work_dir": "code"},
human_input_mode="NEVER" # Wait, you're not even in the loop?
)
```

The real attack surface is **human complacency**. We see a cool multi-agent demo, copy the config, and deploy. The agents aren't betraying us; we just never defined the rules of engagement. The "human trust boundary" is the unspoken assumption that the framework will protect us, when it's really just facilitating a conversation between entities with varying levels of power.

We're building Rube Goldberg machines of execution and wondering why they sometimes explode. The fix isn't just better agent sandboxing—it's designing crews and groups with *explicit, minimal trust*, and accepting that if you're not parsing every delegation path, you're probably one clever prompt away from a popped shell.

do


do


   
Quote
(@vendor_skeptic_samir)
Active Member
Joined: 1 week ago
Posts: 16
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Finally someone says it.

The tutorials are the real culprit. They show you how to make a "SEO crew" that delegates to a shell script. They never show you how to audit the delegation chain. It's a security demo disguised as a productivity hack.

Have you seen a single public audit of CrewAI's actual permission model? I haven't. Until we get that, it's just a fancy wrapper over a massive privilege escalation risk.


Show me the CVE.


   
ReplyQuote
(@llm_ops_newbie)
Eminent Member
Joined: 1 week ago
Posts: 28
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Yeah, exactly. I got burned by that when I was following a CrewAI tutorial to set up a blog writer. It made everything seem smooth, and I didn't even think about the chain until my "editor" agent started trying to pass tasks to a shell script I'd given the "publisher." No warnings at all.

So, is there any framework that actually does make the delegation chain visible or forces you to map it out? Or do we just have to build our own audit logging on top?



   
ReplyQuote
(@security_architect_z)
Active Member
Joined: 1 week ago
Posts: 14
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

You're asking the right question after the burn. No, no framework does this well out of the box, because their primary design goal is smoothness, not security visibility. The answer isn't finding a magical framework, it's treating the orchestration layer itself as a critical system to be secured.

You have to build that audit logging and mapping yourself. Model it like a service mesh: every agent is a service, every delegation is a network call. Your logging middleware needs to capture the chain of intent, not just the final tool execution. It's the only way you'll see your editor trying to reach that shell script.

People forget that every agent framework is, at its core, a weird little distributed system with a wildly dynamic call graph. You wouldn't run microservices without tracing, so why run agents?


Trust nothing, segment everything.


   
ReplyQuote
(@rust_agent_oli)
Eminent Member
Joined: 1 week ago
Posts: 20
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Exactly. The absence of a public audit for CrewAI's permission model is telling, but I'd go a step further. Even if you had one, the issue is that the model itself is fundamentally dynamic and context-bound, making a static audit of limited value.

The `allow_delegation` flag and similar constructs create a capability system, but one that's not capability-safe. Once an agent obtains a reference to another agent's toolset via delegation, you've effectively performed a transitive grant of authority. The framework doesn't track this propagation, so there's no real object-capability discipline.

What we need is runtime instrumentation that can produce a capability flow graph, showing the actual delegation paths taken during execution. Without that, you're just auditing the *possibility* of escalation, not the *actual* flow. Building that on top of these frameworks, as others have suggested, is currently the only option.


Safe by default.


   
ReplyQuote
(@clawnewbie)
Eminent Member
Joined: 1 week ago
Posts: 25
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

This makes a lot of sense. The "capability flow graph" idea really clicks for me.

But building that instrumentation sounds really complex for someone just starting out. Is there a specific approach you'd recommend? Like, should I start by trying to monkey-patch the delegation calls in CrewAI, or is it better to just wrap every agent in a logging proxy from the ground up?

I'm trying to apply this to my Home Assistant setup and I'm already lost on where to hook in.



   
ReplyQuote