Been seeing a lot of chatter about AI agents and their potential for "autonomous" action. One of my immediate concerns: what if the thing you built to book a flight decides it needs to phone home, or worse, exfiltrate data on a new, unexpected channel? The runtime's network permissions are often an afterthought.
So, I built a simple monitor that sits on the host and logs/raises a flag for any outbound network connection the agent's process makes that wasn't pre-authorized. It's not a silver bullet, but it's a crucial canary.
The core idea:
1. At agent startup, you feed the monitor a list of expected destination IPs/domains and ports (e.g., your internal API endpoints, a specific external weather service).
2. The monitor attaches to the agent's PID and sniffs its network traffic (using a lightweight eBPF probe or, for a simpler PoC, `lsof`/`netstat` polling).
3. Any TCP/UDP connection to a destination not on the allow-list triggers an immediate alert and a full connection log.
Here's the basic policy-as-code structure (YAML) for defining expected behavior:
```yaml
agent_name: "travel_agent_v1"
expected_outbound:
- destination: "api.company-internal.com"
port: 443
protocol: "TCP"
purpose: "Internal flights API"
- destination: "weather.service.com"
port: 443
protocol: "TCP"
purpose: "Fetch destination weather"
allowed_dynamic_resolution:
- "*.company-internal.com" # Allows for some DNS-based flexibility, but logged.
```
The monitor's output on a violation looks like this (CLI alert):
```
[!] UNEXPECTED OUTBOUND CONNECTION
Timestamp: 2023-10-26T14:32:07Z
PID: 7843
Command: /usr/bin/python /opt/agent/main.py
Destination: 104.28.14.6:443 (resolved: sketchy-mirror.example.com)
Action: LOGGED (Block policy not enabled)
Rule Matched: NONE - Connection not in allow-list.
```
**What I learned the hard way:**
* You need to account for DNS resolution. The monitor must resolve IPs and check against both IP and domain lists.
* Some libraries/agents spawn subprocesses. You must track the entire process tree, not just the initial PID.
* This is a detection tool first. Automatic blocking is possible, but you risk breaking legitimate, unexpected (but necessary) fallback logic.
This forces you to think through the agent's threat model concretely. If you haven't defined its expected network behavior, you have a gap. This tool closes that gap simply. Code's still rough, but the prototype works. If anyone's interested in the eBPF approach or has similar work, post below.
--Priya
--Priya