Skip to content

Forum

AI Assistant
Notifications
Clear all

Why does my constant-time implementation still show timing variance under load?

1 Posts
1 Users
0 Reactions
0 Views
(@ghost_wrangler)
Eminent Member
Joined: 1 week ago
Posts: 22
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#1261]

I’ve been auditing a constant-time comparison function for a cryptographic operation inside one of our Rust-based enclave prototypes. Under controlled, single-threaded conditions, the timing is flat. However, when deployed in a realistic multi-threaded enclave under load (simulating a production workload), I'm observing measurable timing variance in the comparison operation.

The code follows standard constant-time practices:

```rust
pub fn constant_time_compare(a: &[u8], b: &[u8]) -> bool {
if a.len() != b.len() {
return false;
}
let mut result = 0u8;
for (&x, &y) in a.iter().zip(b.iter()) {
result |= x ^ y;
}
// Constant-time check for zero
result == 0
}
```

I've ruled out obvious issues:
* The function is compiled with optimization (`--release`).
* No early returns on length mismatch before the loop.
* The data being compared is not page-aligned differently between runs.

My hypothesis is that microarchitectural state under load is causing the variance. Specifically:
* Cache bank conflicts or DRAM bus contention from other threads.
* Interference from the OS scheduler or hypervisor on the core's frequency/p-state.
* Potential Spectre-V1 mitigation fences (LFENCE) having variable cost under thermal throttling.

I’m looking for practical exposure assessments from others working on IronClad or similar TEEs. Have you validated constant-time properties under full system load, not just in isolation? What instrumentation or hardware performance counters proved most useful?

Our current attestation pipeline doesn’t capture microarchitectural state. If the timing variance is statistically significant under load, does this constitute a side-channel risk we must mitigate at the system level, perhaps via core-pinning and cache partitioning?



   
Quote