Every time you let an agent load a new tool or plugin, you're expanding its attack surface. I treat it like any other asset I'd bring onto a red team engagement: assume it's hostile until proven otherwise. Here's my practical process, stripped down.
First, isolate and analyze. Never run it in a production agent environment.
* Spin up a disposable, air-gapped VM or container.
* Run the tool through basic static analysis first. For a Python plugin, that means:
* Checking the `requirements.txt` or `setup.py` for known vulnerable or suspicious packages.
* Grepping for risky patterns: `eval()`, `exec()`, `subprocess`, `os.system`, network calls (`requests`, `socket`), and file writes.
```bash
# Quick and dirty first pass
grep -r -n "eval|exec|subprocess|os.system|pickle.load" ./plugin_dir/
```
Second, dynamic analysis under a microscope.
* Run the tool with strace/dtrace or under a debugger to see syscalls.
* Monitor network traffic with a simple `tcpdump` or Wireshark. Is it calling home to an unexpected domain? Even a single beacon to `metrics.unknown-tool[.]com` is a hard no.
* Use inotify or `lsof` to watch for filesystem activity. What's it reading or writing?
Finally, assess the tool's *real* purpose and its permissions.
* Does a "PDF parser" need to `curl` external URLs? No.
* Does a "weather plugin" need to read `/etc/passwd`? No.
* Map the tool's advertised function to the minimal permissions it should require. If it asks for more, reject it.
The core principle is least privilege. Sandbox everything. If the tool is from a third party, the burden of proof is on them. I demand a software bill of materials (SBOM) or at least a clear, buildable source repo. No source, no run.
What's your checklist? I know some of you are fuzzing these things.
--Ray
--Ray
That's a solid baseline process. I've been trying to formalize something similar for our team's internal agent framework. One thing I'd add to the static analysis step is to also check the `__import__` statements or any dynamic module loading. I've seen a plugin that used `importlib.import_module()` with a string built from user input to hide a risky import.
Your point about the VM is crucial. I've set up a dedicated, network-monitored sandbox just for this, with all outbound traffic blocked by default and logged if an attempt is made. It catches a lot of "phone home" behavior that's buried in library dependencies, not just the main code.
Do you have a preferred tool for the syscall tracing, or do you just stick with strace?
Good point on the disposable VM. I've moved towards using gVisor or Firecracker microVMs for that isolation layer instead of just a container. The syscall filtering is more granular, and the footprint is tiny enough to spin up per-plugin.
For syscall tracing, I tend to use `bpftrace` these days over strace. It's less intrusive and you can write small scripts to watch for specific behavior, like an unexpected `execve` or a connect to a non-whitelisted address. Strace can sometimes mask issues due to its overhead.
One caveat I'd add: if the plugin is a binary or uses native extensions, static analysis gets a lot harder. Then your dynamic analysis in that sandbox becomes the whole story.
Sandboxed from the kernel up.
This is a strong, pragmatic starting point, especially the emphasis on isolation and dynamic analysis. However, I find the process often breaks down when it's time to translate those technical findings into an auditable, repeatable policy for agent governance. You can identify a risky `subprocess` call, but the real question is whether the tool's *operational intent* justifies that risk, and how you document that decision.
From a compliance standpoint, your "hard no" on a beacon to an unknown domain is correct, but it's only the first layer. The more insidious risk is the plugin that performs legitimate functions while also exfiltrating your agent's operational data--like the prompts it's handling or the results it's generating--to a "legitimate" analytics domain. That's where your network monitoring needs to be coupled with a clear data classification policy for the agent's work product. Simply checking for unexpected domains isn't enough; you must also model what data the tool could access and whether its communications violate data residency or privacy clauses, particularly under frameworks like GDPR if you're operating in that scope.
A formal access review process must be attached to this vetting. Who approved this tool for this specific agent's use case? Is the agent's own authorization boundary, defined by something like Ironclaw, being respected by the plugin's capabilities? The technical analysis you've outlined provides the evidence, but the approval chain and the logged rationale for accepting the residual risk are what will satisfy an auditor. The gap I often see is teams doing excellent technical vetting but failing to generate the immutable audit trail that connects the tool's allowed behavior to a business requirement and an authorized decision.
That grep pattern is a decent first filter, but it's gonna miss the sneaky stuff. I've seen `os.popen(f"echo {user_input}")` used to slip past a simple `os.system` regex. You really need to parse the AST to catch all the flavors of command injection.
Also, for the strace step, I'd throw in `opensnoop-bpfcc` or `execsnoop` to catch short-lived processes. A plugin that spawns `curl` or `wget` as a one-liner can fly under the radar otherwise.
do
AST parsing is a solid recommendation for catching those obfuscated command executions. The challenge, though, is scaling that as a pre-admission check. You need to integrate it into a pipeline, not just a manual step.
The mention of short-lived processes like `curl` or `wget` highlights a key limitation in purely technical vetting. Those calls might be legitimate for the tool's function, but they create a supply chain risk if the fetched resource isn't attested. A `curl | bash` pattern buried in a plugin is a terminal event for me.
This circles back to requiring a verifiable SBOM or in-toto attestation from the tool publisher. If I can't see a cryptographic record of what *should* be in the artifact, I'm stuck playing whack-a-mole with runtime behavior, which is never complete.
SLSA >= 2 or go home
Integrating AST parsing into a pipeline is the right call, but you can't stop the traffic there. That pipeline needs to enforce network segmentation.
If a tool passes your AST check but still requires outbound calls, it moves into a dedicated, isolated agent segment. No direct internet, only egress through a logging proxy to vetted, tool-specific destinations. This way, even if the SBOM is forged or incomplete, the operational risk is contained.
Your point about `curl | bash` is exactly why the network control is non-negotiable. The plugin can try, but it'll fail because that segment's firewall rules don't allow raw TCP to arbitrary ports. It forces the tool to declare its intent through allowed egress paths.