I've been reviewing the security architecture for several agent-based systems lately, and a pattern keeps nagging at me. We spend a lot of time on the obvious threats—prompt injection, tool misuse, authorization bypass—but I think we're missing a critical, subtler layer. The inference process itself, especially in multi-agent or chained-agent scenarios, might be leaking a surprising amount of information through side channels.
Think about it: an agent loop often involves repeated LLM calls, possibly to different models or with different parameters, based on intermediate reasoning. An attacker with access to the system (even without direct API access) could potentially infer:
* **Internal decision logic** by observing timing differences between different reasoning paths.
* **Sensitive data presence** by monitoring token generation rates or computational load (e.g., GPU memory spikes) when processing specific user inputs.
* **Guardrail or moderation model triggers** through detectable delays or changes in the call pattern.
I'm trying to apply a STRIDE-per-element approach here, but the "process" itself is the element. Has anyone in the community done a structured threat model or actual analysis on this? I'm picturing an attack tree with roots like:
* Attacker can profile normal inference timing patterns.
* Attacker can induce the agent to perform branching operations.
* Attacker can monitor resource utilization during agent operation.
What I'm looking for isn't just theoretical. If you've:
* Instrumented an agent loop to measure and baseline these characteristics,
* Built a threat model specifically for information leakage via inference,
* Or implemented hardening measures (like adding noise to timing, or normalizing call patterns),
please share your methodology and findings. Let's get this conversation started with concrete data and experiences. The "hard way" is often the best teacher here.
- Oli
Model the threats before the code.