Hey folks,
Been tinkering with the local AI agents I've got running in the lab, mostly on my home automation server. It's great until you realize these things can sometimes make outbound calls via their tools that you didn't explicitly trigger. I wanted a simple, lightweight way to get alerted if something tries to phone home unexpectedly, outside of approved patterns.
I built a small Python script that sits on my pfsense box (via a cron job) and parses the firewall logs. It looks for outbound connections originating from my AI agent host's IP, but excludes whitelisted destinations I've predefined—like the specific API endpoints for Claude, OpenAI, or my local Vault instance. Anything else triggers a notification.
Here's the core logic:
```python
#!/usr/bin/env python3
import subprocess
import re
from datetime import datetime
AGENT_IP = "192.168.1.50"
WHITELIST_DOMAINS = ["api.openai.com", "api.anthropic.com", "localhost:8200"]
def tail_log():
cmd = ["tail", "-n", "100", "/var/log/filter.log"]
result = subprocess.run(cmd, capture_output=True, text=True)
return result.stdout
def parse_log(log_lines):
alerts = []
for line in log_lines.split('n'):
if f"SRC={AGENT_IP}" in line and "DPT" in line:
dest_ip_match = re.search(r'DST=([d.]+)', line)
if dest_ip_match:
dest_ip = dest_ip_match.group(1)
# Reverse lookup or check against whitelist (simplified here)
if not any(domain in line for domain in WHITELIST_DOMAINS):
alerts.append(f"Unexpected outbound: {line}")
return alerts
if __name__ == "__main__":
logs = tail_log()
found = parse_log(logs)
if found:
# Send to my monitoring dashboard (Wazuh) or a simple webhook
with open("/var/log/agent_monitor.log", "a") as f:
f.write(f"{datetime.now()} - Alerts:n" + "n".join(found))
```
It's basic but effective. I have it running every minute, and any alerts get dumped to a log that Wazuh picks up. I also get a Telegram message if more than two unusual connections happen in a five-minute window.
Next steps? I'm thinking of integrating this directly with my Vault to check if the outbound call uses a valid token. Has anyone else set up something similar for monitoring agent behavior? Curious about other approaches.
Kenji
Kenji
Nice approach! I've been down a similar road with my Dockerized agents. One thing I'd watch out for is making sure your script is catching all the potential sources of outbound calls. Agents can sometimes spawn subprocesses or use temporary containers that might have different source IPs.
Might be worth pairing this with something like `tcpdump` on the host itself for a second layer, just in case. Also, consider logging the agent's own internal tool usage alongside the network calls - sometimes the weird behavior starts there before it goes external.
Great to see others thinking about this. The whitelist method is solid, but can be a bit of a cat-and-mouse game when you add new tools. 😅
That's a good idea for monitoring. I've got a beginner question though. How do you actually know what to whitelist? Like, my agent uses langchain tools and sometimes it pulls new packages. If it hits pypi.org or github to download something, that's probably okay, right? But what if it tries to download from some random repo? Would that just look like a normal https connection to you?
> How do you actually know what to whitelist?
That's the central problem. You don't, initially. I start with a strict deny-all policy during a controlled observation period. Run your agent through its full, intended workload in a sandbox, and log every single outbound connection. That log becomes your initial, evidence-based whitelist.
For langchain pulling packages, you're right - it's just TLS to pypi.org. The risk isn't the domain itself, but the *artifact* it fetches. Your script won't see the malicious package, just the connection. This is why network monitoring is only one layer. You need to pair it with integrity checks on the downloaded files, or better, pre-cache all dependencies in an internal artifact repository and block all external pypi/github traffic after setup.
Every API endpoint is a threat surface.
Your whitelist is way too narrow.
> like the specific API endpoints for Claude, OpenAI, or my local Vault instance
This misses all the supporting infrastructure. CDNs, authentication redirects, package repos, DNS lookups. Your script will scream nonstop.
What about logging? If your agent uses a third-party tool, it might need to connect to a logging service. Or a metrics collector. Or a static asset host.
You're also trusting that your agent's tools only ever call the base API domain, which is naive. A tool using an external library could easily connect anywhere.
That's a clever way to start monitoring, using the existing pfSense logs. I've been thinking about similar issues but from the concurrency angle. Your script is polling the log file; have you considered the race condition between when an outbound call happens and when your cron job runs? An agent could initiate and complete a call in the interval between script executions, so it might not appear in the tail of the log.
Also, if multiple agents or threads are making calls simultaneously, the log entries could interleave. Your parsing loop assumes each line is a complete entry, but depending on your log format and volume, that might not hold. It could lead to false negatives if a line gets split or merged incorrectly.
Would you be open to sharing the regex pattern you're using inside your `parse_log` function? I'm curious how you're matching the destination against the whitelist, especially for IP addresses that might resolve from those domains.
That's a really good point about subprocesses and temporary containers. I'm running my agents in Docker too, and I didn't even think about the dynamic IPs.
Do you have a method for tracking those ephemeral containers? I guess you could tag them or filter by the Docker network subnet. But pairing it with tcpdump on the host as a second layer sounds like the safest bet for catching everything.
The internal tool usage log alongside network calls is a great idea. It would help trace where a weird external call actually originated from.
~Anna
You're right that dynamic IPs are the real snag here. I've tackled this by having my monitoring script fetch the current list of container IPs from the Docker daemon at runtime, before parsing logs. It's still a snapshot, but it catches anything running at that moment.
> filter by the Docker network subnet
That's a decent fallback, but it gets messy fast. If you share networks between containers, or use bridge networks, you'll catch traffic from non-agent containers too. I've found tagging the specific agent containers and filtering that way is more precise, though it adds some setup overhead.
The real power move, though, is correlating internal tool logs with network calls. If you see a call to an unexpected IP, you can trace it back to which tool execution chain spawned it. That's how you catch not just the call, but the *intent* - was it a poisoned tool that suddenly decided to exfiltrate data? You need that lineage.
ak
The whitelist approach is a decent start, but it assumes a static world. What about API updates, where a provider silently shifts endpoints to a new CDN? Your script screams about a call to "assets-new.openai.com" because it wasn't in your original list, but it's just a stylesheet.
The bigger issue is that you're treating this as a pure network problem. The real question is whether the agent's *intent* was sanctioned. A call to an unexpected IP that's just a redirected OAuth flow for a tool you *did* authorize is a false alarm. A call to an approved domain that's exfiltrating your vault data is a miss.
You're chasing symptoms. Log the tool invocation and its purpose first, then decide if the network call that follows is legitimate.
Security theater is still theater.
> you're chasing symptoms
That really hits home. I've been so focused on the network part I didn't even think about the intent part. So, basically, I should be logging what my agent is *trying to do* first, like "the email tool is trying to send to this address" or "the browser tool wants to fetch this URL", and *then* see if the network call that follows matches that? That makes way more sense.
But how do you log the intent? Do you have to hook into the agent's internal decision loop somehow? Sorry if that's a super basic question 😅
Learning by doing (and breaking).
> But how do you log the intent? Do you have to hook into the agent's internal decision loop somehow?
Exactly. You need to instrument the agent runtime itself, which is a layer above network monitoring. If you're using something like LangChain, you can implement a custom callback handler that logs every tool invocation attempt - the tool name and the arguments - before execution. This gives you the declared intent.
The caveat is that this only logs what the agent *says* it wants to do. A truly malicious agent, or a compromised tool, could still lie in that log, or the tool's own code could make a network call that doesn't align with its described purpose. That's why you need both: the intent log from the runtime, and the network log from the host, then you correlate them. A mismatch between "tool X called with argument Y" and a subsequent call to an unrelated domain is your high-fidelity signal.
I've been doing this for fuzzing agent runtimes; you can see my instrumentation patch for a common framework in my last disclosure write-up. It logs the tool call chain, then I replay the session while sniffing traffic to build a causality map.
ol
> import subprocess
You've truncated your code snippet prematurely, but the approach is fundamentally sound for a first-pass, infrastructure-level filter. However, I'm concerned about your reliance on a static `AGENT_IP`. This presumes your agent host is a static entity on your network, which is rarely the case if you're using any form of containerization or dynamic provisioning. Even on a home server, a DHCP renewal could shift that address.
A more durable method would be to have your script resolve the hostname of the agent server at runtime, or, better yet, tag the traffic at the firewall level with a unique identifier. Many setups, especially in labs, use VLANs or specific firewall rulesets for experimental systems. You could parse the log for traffic matching that rule number or tag instead of a raw IP.
The deeper issue, which others have started to outline, is the architectural assumption implicit in your whitelist. You're authorizing domains like `api.openai.com`, but you're not validating the payload or the intent behind the call. That domain could be serving a compromised API endpoint, or your agent could be exfiltrating data through what looks like a legitimate completion request. Your script sees a permitted domain and remains silent.
Consider extending your logic to also sample or checksum the payload being sent, perhaps by integrating with a transparent proxy on the host. Without that, you're only monitoring the envelope, not the letter inside.
Data leaves traces.
Missing the point. You're whitelisting specific domains, but parsing a raw firewall log full of IP addresses. Did you even write the DNS lookup part? Or are you just hoping the log magically resolved every connection to a domain for you?
That's a fair criticism, and you're right to call it out. In my original mental sketch, I was naively assuming the firewall log would have the resolved hostname from some reverse DNS lookup, but you're correct that pfsense logs typically just have the destination IP for blocked packets. I didn't write the DNS lookup part; I was just thinking about the parsing logic for the fields that *were* there.
Actually, performing a reverse DNS lookup for every foreign IP in the log during the script run could introduce a significant delay and potentially fail if the PTR records aren't set. It might be more reliable to do a forward DNS lookup for the whitelisted domains at script start to get their current IPs, then compare against those. But that falls apart with CDNs and cloud services where the IP list is huge and dynamic. Maybe the flaw is trying to do domain-based filtering at the network log level at all.
Yeah, that's the core problem with any domain-based firewall logic, isn't it? The DNS layer and the IP layer are constantly desynchronized. Even your forward-lookup idea gets wrecked by geo-based CDN routing and short TTLs.
Your last sentence nails it. I think domain-based policy is for the application layer. At the network layer, you have to work with IPs and ports, and accept that your whitelist will be a set of CIDR blocks you grudgingly maintain. That's why I've shifted to just monitoring for calls *outside* of my known cloud provider IP ranges (AWS, GCP, etc.) for my agents. It's coarser, but it doesn't break when OpenAI spins up a new endpoint. The real intent filtering has to happen in the runtime logs.
Budget and monitor.