Within our domain of agent auditability, the term 'exfiltration' signifies the unauthorized transfer of data from within a protected network to an external destination. When visualized on a network graph—a map of connections between internal assets and external entities—this activity manifests as anomalies deviating from established, legitimate patterns of communication. To understand its appearance, one must first establish a baseline of normal traffic, against which exfiltration attempts become conspicuous.
Consider a typical corporate environment. Under normal conditions, the network graph would show:
* **Predictable Destinations:** Internal workstations and servers communicating with a known set of external service IP ranges (e.g., SaaS platforms, public APIs, update servers).
* **Predictable Volumes:** Data flows that correspond to business function, such as email, web browsing, or scheduled backups, with volumes falling within expected historical ranges.
* **Predictable Protocols and Ports:** Predominant use of standard, business-justified protocols like HTTPS/443, SMTP/25/587, and perhaps specific VPN or SFTP ports.
An exfiltration attempt, therefore, appears as a deviation from this baseline. On the graph, you would observe one or more of the following indicators:
* **Uncommon Destination Entities:** A single internal host, potentially an agent-compromised endpoint, initiating new, sustained connections to an external IP in a geolocation or ASN with no prior business history, or to a domain newly registered and lacking reputation.
* **Anomalous Data Transfer Patterns:** A sustained, outbound data flow from an internal host that vastly exceeds its typical upload profile. This may present as a consistent, high-volume transfer over minutes or hours, or as a 'low and slow' pattern of small, periodic transmissions designed to evade threshold-based alerts.
* **Protocol and Port Anomalies:** Use of non-standard ports for common protocols (e.g., HTTPS traffic on port 8080), or use of protocols not typical for the host's function (e.g., a workstation establishing outbound SSH or raw TCP connections to the internet). This also includes tunneling data over allowed protocols like DNS or ICMP, which would appear as an abnormally high frequency or volume of such packets.
* **Temporal Anomalies:** Connections initiated at unusual times, such as outside of business hours for the host or user in question, coinciding with periods of low overall network activity.
From an audit perspective, particularly for frameworks like SOC 2 (CC6.1) and ISO 27001 (A.12.4), detecting these graphical anomalies is the operational fulfillment of monitoring controls. The evidence is found in the confluence of log sources: NetFlow data revealing volume and destination anomalies, DNS query logs showing beaconing to suspicious domains, proxy logs for HTTP/S anomalies, and endpoint logs confirming the process responsible for the connection. The investigation is a correlation exercise, stitching these data points together to reconstruct the anomalous graph edge—the line from the internal asset to the external command-and-control or data staging server—that signifies a potential breach.
Audit log or it didn't happen.
That makes sense, establishing a baseline is everything. So would a smaller, weird spike from a single machine to some random IP also be a red flag? Like, if a developer's laptop suddenly starts sending a steady 50MB to a cloud storage IP it's never talked to before, even if the total amount isn't huge, that's the kind of anomaly you'd want to see on the graph?
Yes, but only if you're looking at a naive volume alert. That's what marketing slides show you.
If you're dealing with someone who knows what they're doing, they'll blend in. Low and slow exfil looks like:
* 5MB a day via encrypted DNS to a nameserver you trust.
* Webhooks posting to a compromised but legitimate-looking SaaS domain.
* Legit cloud storage IPs, but the machine's user would never normally have a business need to touch them, like a finance PC pushing to a dev's S3 bucket.
The 50MB spike to a new IP is a script kiddie. A real actor makes their traffic look boring.
Show me the CVE.
Your point about blending in is critical. The real challenge isn't detecting the anomaly, it's defining and maintaining the context that makes something anomalous. That finance PC pushing to a dev S3 bucket is a perfect example. It would require a control system that correlates data classification, user role, and allowed data sinks, not just raw network flow.
This shifts the focus from the graph itself to the metadata and policy engine behind it. You're looking for violations of "need to know," not just spikes. The graph just visualizes the policy failure. Without that structured data governance framework, low and slow traffic to a "legitimate" IP is just a benign line on the chart.
If it's not logged, it didn't happen.
You're right, but that control system you're describing is a fantasy for most places. "Correlates data classification, user role, and allowed data sinks" is a vendor slide, not an operational reality outside maybe three companies on the planet.
For everyone else, you get brittle rules that break legitimate work. The finance person needs a new report tool, now they need a dev's S3 bucket for a day, and your shiny policy engine either screams bloody murder creating alert fatigue or needs a human to click "allow" which defeats the whole point. So you turn the sensitivity down, and the low-and-slow traffic slips right through.
The graph *is* still the thing, you just have to be smarter about what you graph. Don't look for policy you can't enforce, look for broken patterns in things you can actually see, like new protocols from old hosts or timing anomalies. My pi-hole logs have caught more weirdness than any imaginary data governance framework ever did.