Hi everyone. I've been running a SuperAGI instance in my home lab for a few weeks now, mostly to learn about agent workflows. One thing that kept me up at night was the idea of an agent, either through a prompt or a faulty marketplace tool, trying to execute something I really didn't want it to. The default install feels very powerful, but maybe a bit too trusting?
I decided to build a simple policy engine that sits between the agent and its tool execution. It intercepts every tool call, checks it against my rules, and can approve or deny it before anything actually runs. It’s been a great learning project for understanding the SuperAGI codebase.
My current setup is basic but effective. The policy engine is a separate Python service that the main SuperAGI instance calls via a modified tool execution layer. I defined my rules in a YAML file that's easy to edit. Here’s a snapshot of my rule structure:
```yaml
rules:
- tool_name: "send_email"
action: "require_approval"
parameters:
- field: "recipient"
not_allowed_patterns: ["*@external-company.com"]
- tool_name: "execute_shell_command"
action: "deny"
log_severity: "alert"
- tool_name: "write_file"
action: "require_approval"
parameters:
- field: "file_path"
allowed_paths: ["/tmp/agent_workspace/*"]
```
The actions I have so far are `allow`, `deny`, and `require_approval`. When approval is needed, the agent's task pauses and a notification is sent to a simple web dashboard I built (using Flask) where I can review the request.
Some early lessons and questions for the group:
* The biggest challenge was cleanly intercepting the tool calls without forking the whole SuperAGI project. I ended up wrapping the tool execution class.
* It adds a bit of latency, but for my learning/experimental scale, it's fine.
* I'm now thinking about where to store the audit log of these decisions. The default SQLite? Something else?
* Has anyone else tackled something like this? I'm especially curious about what kinds of rules would be useful for securing marketplace tools, which seem like a potential blind spot.
This has made me feel a lot more comfortable letting agents run for longer periods. I'm still very much a beginner, so any thoughts or suggestions on improving this approach would be greatly appreciated
Intercepting the call and checking against a static YAML list is a start, but you've just moved the trust boundary. Now you have to trust that your interception point is correct, that the Python service itself can't be bypassed, and that your rule parsing logic is sound.
You're missing system-level hardening for the underlying environment. What stops an approved `write_file` tool from writing a shell script and then a separate, approved `read_file` tool from executing it? Your policy only sees the individual, atomic tool calls.
You need to layer in proper isolation at the container or sandbox level. Run the entire agent runtime in a container with a strict seccomp profile, no capabilities, and an AppArmor profile that denies execve of new binaries. That way, even if your policy engine misses something, the damage is contained.
Also, you should be logging to an immutable audit trail outside the agent's control.
Least privilege, always.
That's so cool! I'm just getting into SuperAGI and this exact worry is why I haven't given it real tasks yet. I love the idea of intercepting the calls.
Question: how do you actually stop the agent if you deny the action? Like, does it just get an error message and try something else, or does the whole agent run get paused? I'm worried mine would just retry in a loop 😅
Hey user278, congrats on getting this working! That exact worry is what pushed me to start modding my own agents too. I love the YAML rule structure you shared, it's super clear.
One thing I ran into with a similar interceptor was around the "require_approval" action. How are you handling the actual approval step? In my setup, I hooked it into a Home Assistant dashboard, so I get a notification and can approve/deny from my phone. Without that, I found the agent would just time out waiting and the run would fail. Curious if you're doing something similar or if you have a manual review queue.
Also, for the `write_file` rule you cut off in the snippet - are you doing path-based restrictions? That's saved my bacon a couple times, limiting writes to only a specific scratch directory.
If it's not broken, break it for security.
The Home Assistant dashboard integration is a really clever way to handle the approval loop, I like that a lot. My initial approach was just a simple webhook that logs to a text file for me to check manually, but obviously that's not sustainable for anything time-sensitive.
You're right about path-based restrictions, that was one of the first things I added after my prototype. I have a default rule that blocks any write_file outside of a designated scratch directory, and then specific allow rules for the few places I actually want it to go, like a specific config folder for my home automation stuff. It's still making me think about the chain of events, though. Like, what if it writes a Python script to the approved scratch directory? The engine might approve that write, and then later it could try to execute it. I'm still figuring out how to model that sequence as a policy, or if I need a different layer like user385 mentioned.
Path restrictions are a good first containment layer, but they're just that - a first layer. If an approved `write_file` can drop a `.py` or `.sh` file into the scratch directory, you haven't solved the core issue; you've just defined the arena.
The real problem is execution control. You need to fingerprint the agent's *behavioral chain* across multiple tool calls, not just authorize each one in isolation. My rule sets often flag on patterns like `write_file` to a permitted path, followed by a `read_file` or `execute_command` referencing that same path, even if each individual call would pass. It's about connecting the dots the policy engine can't see at call-time.
fingerprint all things
Intercepting at the tool call level is an interesting hack, but you're just building a nicer-looking cage door while the walls are made of paper. You've said the default install feels "too trusting," but you're replacing one trust model with another that's arguably more dangerous because it gives you a false sense of security.
The real issue isn't the agent's intent, it's the process boundary. You've modified the SuperAGI codebase to call your Python service. What stops a future update, a plugin, or even a nested agent invocation from side-stepping that modified layer entirely? You're now responsible for the correctness of your patch across versions.
If you're going to the trouble of modifying the runtime, you should be injecting a syscall filter, not a policy service. Make the underlying OS enforce the rules you're trying to write in YAML. A seccomp-BPF filter that denies `execve` and `socket` syscalls unless from a specific, known binary path does more to contain a rogue agent than any number of approved or denied tool calls ever could. Your policy engine can then be a logging layer on top of actual enforcement, not the enforcement itself.
Default deny or go home.
Hey user278, welcome. That's a fantastic learning project, and jumping into the codebase to add this is exactly how you get a real feel for these systems.
The `require_approval` action you've sketched is a key piece. How are you planning to surface that approval request to a human, and what's the agent's fallback behavior if approval is denied? In my own tinkering, that's where the real workflow friction shows up. Without a clear path, a denied action can leave the agent stuck in a loop or kill the task entirely.
I'd also gently push on the "simple policy engine" point. Once you start adding regex patterns on parameters and chaining rules, the complexity can grow quickly. It's worth mapping out what happens when two rules conflict, or if a rule fails to parse. Keep an eye on that as your YAML grows.
Stay sharp.
Good on you for hacking on the codebase directly. That's how you actually learn these systems, not just theorize about them.
>The policy engine is a separate Python service that the main SuperAGI instance calls via a modified tool execution layer.
This is the part I'd scrutinize. You've introduced a network hop. What's your failure mode when that service is down or slow? Does the agent hang, or is there a default-deny/allow? You've essentially added a new critical path SPOF.
Also, make sure you're passing the full context (tool params, maybe previous steps) to your service, not just the tool name. A rule like "allow `write_file` if the file extension is `.log`" is useless if you only send the tool name and not the `file_path` argument.
ship it or break it.
You're right about behavioral chaining, but tracking state across tool calls introduces a new problem: rule explosion. If you have N tools, you can't realistically define patterns for every possible N-length sequence.
The simpler fix is to remove the underlying capability entirely. If you're worried about `write_file` followed by `execute_command` on the same path, block `execve` at the syscall level. The container's seccomp profile can deny execution of any new file, regardless of how it got there. That cuts the chain at its root without needing complex stateful rules.
Love the YAML structure, that's a really clean way to start. It makes the rules human-readable which is half the battle.
One thing I'd add to your example is a default fallback rule. In my engine, the last rule is always a catch-all that logs and denies any tool that wasn't explicitly matched. Saved me from missing things when I added new tools and forgot to update the policy.
Also, echoing what user55 said about the network hop: have you thought about making the service a local Unix socket instead of TCP? It cuts down the latency and removes a whole class of network config issues. I can share the systemd socket unit file I use if you want.
Automate the boring parts.
You say your setup is "basic but effective." I'll believe the basic part. You've bolted a policy check onto the side. You haven't published any benchmarks for the latency overhead this introduces on every single tool call. If your agent makes 50 calls for a task, you're adding 50 network round trips. That's a real performance cost you're glossing over.
> "feels very powerful, but maybe a bit too trusting"
And your approach is better? You trust your own patch, your Python service, your YAML parser, and the network link between them. Where's the audit of that chain? You just moved the trust, you didn't reduce it.
Show the latency numbers per call with your service running, or it's just security theater that also makes your agent slower.
Numbers or it didn't happen.
Your rule snippet cuts off, but based on the pattern I'd ask: what's your fallback for an unmatched tool? Default deny or allow? If it's not explicit, you missed the most important rule.
Don't use YAML. It's a time bomb for edge cases and unexpected types. Use something that enforces a schema, like JSON with a validation layer, or write the rules directly as Python dictionaries. Your engine will fail on a malformed regex or a missing field long before any agent does.
Drop the --privileged flag.
The YAML point is a red herring. The real risk isn't syntax, it's your implicit trust boundary shifting from the runtime's tool whitelist to your homemade parser's interpretation of that YAML. You've now got to secure your rule loader, your pattern matcher, and the data flow between them.
If you're already in the codebase modifying the tool layer, you should be rejecting calls based on the internal tool schema before they ever hit your "policy engine" service. That cuts the attack surface of your own policy engine in half. Validate at the source, enforce at the service.
Show me the threat model.