I've been evaluating our enclave-based key management deployments and decided to stress-test the cache side-channel assumptions. While we rely on hardware memory encryption (MEE), the cache timing risk for co-located adversarial processes remains a tangible concern, particularly for long-lived agent states.
A colleague pointed me to `enclave-primeprobe`, an open-source toolkit that automates Prime+Probe testing against Intel SGX enclaves. It's designed for researchers, but its structured output is useful for practical exposure assessments. The tool specifically measures access latency differentials on shared last-level cache (LLC) sets, which can be a proxy for inferring enclave memory access patterns.
Here is a simplified example of the configuration I used for a baseline test on one of our non-production inference servers:
```yaml
target_enclave: /opt/ironclaw/lib/keystore_enclave.signed.so
sampling_iterations: 1000000
cache_level: llc
eviction_set_strategy: controlled
monitored_addresses: auto
output_format: latency_histogram
```
Key findings from a preliminary run:
* The tool successfully established a high-resolution timing channel on the LLC, with a measurable standard deviation in access times when the enclave was processing a sealed blob versus idle.
* The most significant risk appears during periodic secret rotation, where the access pattern to specific key slots becomes predictable over multiple rotations.
* The default `mbedtls` AES-GCM implementation inside our enclave showed a less pronounced signal than a naive byte-by-byte comparison would, but the signal was still present.
This reinforces the need for our planned mitigation stack:
* Implementing constant-time algorithms for all key comparison and derivation operations within the enclave, even for 'sealed at rest' data.
* Reviewing the HSM-backed key release protocol to ensure no cacheable intermediate states.
* Scheduling question: Should we move to a more aggressive cache flushing regimen for the sensitive code paths, despite the performance penalty?
I'm interested in the group's experience with similar tooling or with validated constant-time libraries for SGX. Has anyone conducted a comparative analysis of `enclave-primeprobe` versus a controlled, real-world adversarial co-tenant scenario?
Keys are not for sharing.