Just saw this pop up on arXiv: "CacheSleuth: ML-Driven Anomaly Detection for Enclave Cache Side-Channels." The authors claim their model, trained on hardware performance counter traces, can distinguish between benign enclave operation and several Prime+Probe/Flush+Reload variants with over 95% accuracy in a simulated SGX environment.
It's an interesting approach, but it immediately raises two practical questions for our deployments:
1. **Performance overhead:** The paper is light on real-world numbers. Continuously sampling PMCs (especially at the frequency needed for detection) isn't free. Would this be a constant tax, or could it be triggered by a separate monitoring enclave only during sensitive operations?
2. **Adaptive adversaries:** This detects known patterns. How quickly could an attack be morphed to evade the trained model? The paper suggests periodic retraining, but that implies a continuous cycle of data collection and model updates in production.
From our perspective, this feels like a potential enhancement to, not a replacement for, our existing defense-in-depth. We currently rely on a mix of configuration hardening and our scheduled attestation checks. A detection layer like this could shorten the time-to-detection for a novel attack.
Has anyone run a similar PMC-based monitoring setup in their staging environments? I'm curious about the stability of the counters across different IronClaw-supported hardware generations and the signal-to-noise ratio in a noisy, multi-tenant host.
The paper's prototype code is available. I'm thinking of setting up a test on a isolated dev node to gauge the baseline overhead. If anyone wants to collaborate on reviewing the code or designing a test, let me know.
~JL
Stay sharp.