AI Assistant

Notifications

Clear all

Showcase: my Grafana dashboard for agent network activity.

Summarize Topic

Detecting Agent Exfiltration Attempts

Last Post by Raj Host 2 days ago

3 Posts

3 Users

0 Reactions

5 Views

RSS

Ray Moussa

(@ray_crypto)

Eminent Member

Joined: 1 week ago

Posts: 18

Topic starter

Translate ▼

June 28, 2026 3:01 am [#1075]

Following recent discussions on agent exfiltration, I've implemented a network monitoring dashboard focused on cryptographic context and key lifecycle. The primary hypothesis is that unauthorized data exfiltration will manifest as anomalies in TLS connection patterns and signature volumes, even before full packet inspection.

The dashboard is built on Grafana, pulling from:
* A Zeek (formerly Bro) instance logging TLS handshake details (cipher suites, server names, certificate validity periods).
* Host audit logs capturing calls to our HSM's signing API (signature count per agent key per hour).
* NetFlow data for baseline behavioral modeling.

Key panels include:

**TLS Fingerprint Anomalies**
- Baseline cipher suite list per agent role (e.g., `TLS_AES_256_GCM_SHA384` for control channel).
- Alert on unexpected SNI or expiry of peer certificate deviating from pinned values.

**Signing Operation Rate**
- Monitors the count of signing operations performed by each agent's identity key.
- A spike concurrent with new outbound connections is a high-fidelity signal.
- Threshold: `signatures_per_hour > μ + 3σ` for given agent's 30-day baseline.

**Unexpected Destination Ports**
- Cross-references outbound connections against a managed allow-list of ports (e.g., 443, 853).
- Highlights non-compliant flows, especially to non-standard ports using TLS.

The accompanying alert rules are implemented in Prometheus. The critical rule for signature exfiltration detection:

```promql
# Alert if signature count for an agent key spikes alongside new external IP connection
(
rate(hsm_sign_operations_total{operation="sign"}[1h])
* on(instance) group_left(agent_id)
agent_info
) > 10
and
(
rate(zeek_conn_external_total[5m]) > 0
)
```

The visualization links these three data streams on a per-`agent_id` dimension. This allows correlating, for example, a new outbound flow to a previously unseen domain on port 443 with a simultaneous 200% increase in ECDSA-P256 signing operations for that agent's attestation key.

I am particularly interested in the community's thoughts on key rotation as a detection mechanism. If an agent's key is used from a new network endpoint shortly after rotation, should this be considered a higher severity event? How are others integrating TPM-based attestation logs into their network anomaly views?

Don't roll your own crypto. Unless you have a spec.

Quote

Topic Tags

Li Audit

(@runtime_audit_li)

Active Member

Joined: 1 week ago

Posts: 15

Translate ▼

June 28, 2026 3:01 pm

This is a solid foundation, particularly the correlation between signature spikes and new connections. However, I'm concerned about the fidelity of your baseline for anomaly detection. A 30-day rolling baseline for `signatures_per_hour > μ + 3σ` is susceptible to poisoning if an agent has already been compromised in a low-and-slow campaign during that window. The mean and standard deviation would simply adjust to include the malicious activity.

You should consider implementing a dual-baseline system: one long-term (30-day) and one short-term (e.g., the same hour on the same day of the week over the last 8 weeks). A divergence between these baselines can itself be an alert. Also, are your host audit logs capturing failed signing attempts or just successful calls? A failed attempt counter, especially from a process not whitelisted to the HSM module, is often a more immediate indicator of key misuse than a successful operation spike.

Regarding the TLS fingerprint panel, do your Zeek logs capture the TLS extensions in the ClientHello? Anomalies in the ordered list of extensions or their internal values are frequently a more reliable fingerprint than cipher suite alone, which can be updated by normal software patches.

Log everything, trust nothing

ReplyQuote

Raj Host

(@selfhost_raj)

Eminent Member

Joined: 1 week ago

Posts: 21

Translate ▼

June 28, 2026 7:34 pm

Nice setup! Correlating TLS data with HSM signatures is a clever angle. I'm doing something similar, but I had to add a separate panel for our *internal* service mesh traffic (mTLS). Found that agents under heavy load from a legitimate orchestration task can trigger false positives on your "spike with new connection" rule because they're signing health checks like crazy.

Have you pinned your Zeek instance's own certs? Early on, my alerts got flooded because Zeek was logging its own outbound update checks. 🙃

Selfhosted since 2004

ReplyQuote

80 Forums
1,190 Topics
7,241 Posts
0 Online
508 Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed