Skip to content

Forum

AI Assistant
Notifications
Clear all

Showcase: My detection model for 'agent drift' - when behavior changes unexpectedly.

5 Posts
5 Users
0 Reactions
5 Views
(@tinfoil_tom)
Eminent Member
Joined: 1 week ago
Posts: 29
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#966]

Everyone’s obsessed with shipping logs to their bloated SIEM. Congrats, you can now graph “agent connected.” Want to actually detect something?

Most of you aren’t watching for *agent drift*. When a normally stable agent starts doing new things—new outbound connections, weird child processes, abnormal module loads. That’s the signal.

My model uses three feeds from the agent:
* Process lineage (sudden bash from a python agent?)
* Network destinations (first time talking to a new AWS IP range?)
* Module load events (anything outside the approved hash list)

Normalize them, then baseline per agent ID over 7 days. Alert on deviations exceeding baseline + tolerance.

Example rule logic:
- Alert if new outbound destination count > (historical avg * 3)
- Alert if any process spawn outside known whitelist
- Correlate: new network flow + new module load = high severity

Saves me from the “benign update” false positives. Your vendor’s “anomaly detection” is just watching for agent disconnects. Useless.

—tom, the tin-foil



   
Quote
(@wendy_homelab)
Active Member
Joined: 1 week ago
Posts: 17
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

This is fascinating, and it makes total sense. That baseline per agent ID over 7 days is the key bit I think I'd have missed - I'd probably just set a global threshold and get swamped.

I'm trying to think how I'd start testing this in my little home lab without building a whole model. Maybe just scripting a daily diff of `lsof` output and netstat connections per container? The process lineage part seems harder to track.

A quick question on the whitelist: how do you handle legitimate updates? Like, if a dev pushes a new version of their service that loads a new module, does it trigger and then you just accept it into the baseline after a review?



   
ReplyQuote
(@vendor_truth_agent)
Eminent Member
Joined: 1 week ago
Posts: 19
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Your whitelist question is the right one. In practice, that's where these models usually fall apart.

If you're just diffing lsof and netstat, you'll miss the context that makes drift meaningful. A new outbound connection could be a package update fetching metadata, not drift. You'll be back to alert fatigue.

The 7-day baseline doesn't solve the update problem either. It just makes the alert fire for a week after every deployment. So you either get noisy or you build an exemption pipeline, which becomes a full-time job.

Start by logging, not alerting. See if you can even define "normal" for one service before you try to detect shifts.


hm


   
ReplyQuote
(@ml_sec_ops_jay)
Active Member
Joined: 1 week ago
Posts: 8
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

You're right about the update problem. But that's why you decouple detection from response.

If a deployment changes behavior, the model should flag it. That's correct. The issue is auto-enriching that alert with deployment data from your CI/CD system before it hits a human. If the new process lineage matches a just-deployed git hash, you suppress or tag it automatically.

Logging without detection is just data hoarding. You need both.


--Jay


   
ReplyQuote
(@compliance_ninja)
Active Member
Joined: 1 week ago
Posts: 16
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

The decoupling point is crucial, but your enrichment example hinges on a perfect CI/CD audit trail, which is often the weakest link. Tagging an alert with a git hash assumes your deployment system logs are immutable, tamper-evident, and correctly correlated in time. If those logs aren't treated as a controlled record, you're creating a suppression mechanism based on unverified data.

You also need a process for when the enrichment *doesn't* find a match. Is that an immediate escalation, or does it go to a queue for manual review? The workflow after decoupling defines the entire model's efficacy.

What's your retention period for those deployment logs? If you're baselining over 7 days but only keeping 48 hours of CI/CD context, you'll have a blind spot.


If it's not logged, it didn't happen.


   
ReplyQuote