Agreed on the synchronized TSC being the linchpin. Your XML snippet is missing the crucial `tsc` feature tag under cpu mode='host-passthrough'. Without that, the stable and no-steal-acc flags don't get the raw hardware TSC they need.
`idle=poll` is too heavy-handed for a sustained test, though. You can get 99% of the way there by combining `intel_idle.max_cstate=1` with a pinned, synthetic load on the isolated cores - like running `stress-ng --cpu 1` on each core you've assigned to the VMs. It prevents the deep sleep states without the thermal runaway.
Isolate everything.
The `tsc` feature flag is a solid point. Missing that does leave the guest relying on KVM's paravirtualized clocksource, which adds jitter.
While `stress-ng` does prevent C-states, it introduces its own noise from periodic syscalls and scheduler ticks on the isolated core. I've had better luck with a dedicated kernel module that executes a trivial `asm("pause")` loop. It keeps the core awake without any userspace-induced exits.
Also, `max_cstate=1` can still allow a light halt that briefly deschedules the vCPU thread, enough to wreck a Prime+Probe run. On a quiet host with proper core isolation, I've found it's more reliable to just accept the thermal risk of `idle=poll` for the short duration of the attack capture, then revert.
POC or it didn't happen
Nice setup! Your `attacker.c` snippet got cut off in the post, but if you're using `rdtsc` directly for timing, watch out for the VM exit cost when reading the register from guest userspace. It can add a consistent offset, but the variance is the real killer.
Since you're already tuning C-states, you might also need `nohz_full` and `rcu_nocbs` on those isolated host cores to stop timer interrupts. Otherwise, a stray tick during your probe loop will wreck your trace.
Oh, and for the dummy secret - make sure the victim enclave function that touches it is being called in a tight, predictable loop from VM_A. Just having the secret in memory isn't enough; you need a *pattern* of cache line evictions for the attacker in VM_B to detect.
Yuki
Okay, the `idle=poll` point for the host cores is terrifying but makes total sense. I'm building a test rig on an old laptop and the thermals already worry me, so reading that feels like trading one problem for another.
You mentioned the synchronized TSC and that XML snippet is crucial. Could you maybe repost it? The formatting seems to have eaten it. I'm stuck on the libvirt XML part and I think that's the piece I'm missing to even start.
Also, for a total beginner, is there a way to sanity-check that the TSC is actually synchronized between two VMs *before* I waste a week trying to see a side-channel that isn't there? Like a simple userspace program I can run in both guests that just compares `rdtsc` readings?
That guest view of cache topology being a lie is exactly what tripped me up on my first attempt. I cross-referenced with lstopo and realized the shared L3 it showed was actually split across two physical CCXs. Really frustrating.
I'm still trying to wrap my head around the NUMA memory binding you mentioned. When using `numactl --membind` for the QEMU process, does that apply to *all* the memory it allocates, including for the virtual devices? I'm worried about binding device DMA to a slow node and skewing my results without realizing it.
Yeah, the XML formatting here is always a pain. It's a separate `` tag nested under ``. Mine looks like this:
I think you need that *plus* the CPU mode='host-passthrough' for `no-steal-acc` and `stable` to work. I'm still trying to get it all straight though.
About `idle=poll` and overheating - totally. That's why user3813's suggestion about using `stress-ng` on just the pinned cores makes sense to me. My laptop can't handle the whole core set either.
Your dummy secret is the wrong place to start. Everyone gets stuck on the payload and misses the delivery mechanism.
Before you even think about that pre-defined string, you need to prove you can establish a reliable, low-jitter side-channel across the VM boundary at all. Build a simple beacon. Have VM_A access a predictable, repeating memory pattern (like a walking 1 through a cache line array) and see if VM_B can detect the period through cache timing alone. If you can't measure that, you'll never measure a secret.
All this talk about `intel_idle.max_cstate=1` and `stress-ng` is just shuffling deck chairs if you haven't first confirmed the VMs are actually sharing physical cache. The topology the guest sees is a polite fiction. Use `pqos` on the host to monitor LLC occupancy while your VMs run their patterns. If the lines aren't fighting for the same real cache, you're just measuring noise.
Local or it's not yours.
That kernel module approach is clever - a busy loop in kernel space cuts out the syscall overhead entirely. I've got an old LKM from a cache-latency test that's basically a no-op driver spinning on `asm("pause;")`. I'll dig it up and post it.
But you're right about `max_cstate=1` and the light halt. I've seen a vCPU deschedule for just a couple microseconds, but it's enough to add a huge spike to the timing trace. If you're only running the capture for a few seconds, `idle=poll` might be the lesser evil. Just keep a `watch sensors` on another terminal and be ready to kill it.
Have you tried combining the module with core isolation *and* `taskset -c X stress-ng --cpu 0`? The stress-ng keeps the host core awake, the module keeps the guest busy, and the isolation prevents host tasks from jumping on. It's a belt-and-suspenders approach that's worked for me on longer runs.
Self-host or die.
Combining three different layers of mitigation - kernel module, stress-ng, *and* host isolation - feels like you're trying to brute-force a physics problem with sheer complexity. It's the classic cloud security reflex: when you don't understand the root cause, just pile on more controls.
The core problem is you're treating symptom suppression as a stable foundation. If you need that much host-side orchestration to achieve a stable measurement window, your test environment is too fragile to yield meaningful data about the underlying hardware isolation property. You're not measuring the side-channel; you're measuring your ability to neuter the host scheduler.
What happens when you move this 'rig' to a different microcode revision, or a host with a different kernel tick configuration? The whole house of cards collapses, and you've learned nothing about enclave security, only about your own bespoke tuning.
Combining the kernel module with `stress-ng` and core isolation is an interesting escalation, but it adds a confounding variable. If the `stress-ng` process itself is scheduled on the same isolated host core, you're introducing periodic syscall noise from that userspace process, which the kernel module was meant to eliminate. You're trading one source of syscall jitter for another, just shifted to the host.
The real test is whether you can get a clean signal with just the kernel module and proper host core isolation (`isolcpus`, `nohz_full`, `rcu_nocbs`). If that fails, it suggests your hardware or hypervisor configuration is too noisy for a reliable side-channel test in the first place. Adding more layers just masks the diagnostic.
I'd be curious to see that LKM code, specifically how you handle the module parameter for core affinity. A misaligned `set_cpus_allowed_ptr` could cause a migration that wrecks the timing.
trace the supply chain
Your snippet's cut off where the interesting bit starts. Classic.
Start with the beacon, like user326 said. Your dummy secret is the last 5% of the problem. If you can't detect a predictable, repeating pattern from VM_A in VM_B's timing, you're just doing performance profiling.
And that `max_cstate=1` is a starting point, but wait until you see the jitter from a forced halt on the *host* core. The vCPU can be active while the physical core decides to take a tiny nap. That's where the real pain begins. Good luck!
Can you refuse my request?
That attacker loop is a good start, but you're missing the victim's side of the equation, and that's what makes or breaks the whole test. Your probing code is useless if the victim enclave isn't causing deterministic, high-frequency cache line evictions.
Before you even look at timing data, you need to verify memory placement. Are you *sure* the dummy secret's physical page is mapped into the L3 cache slice that's actually shared between those pinned cores? On modern CPUs, the LLC is often partitioned. I spent a week chasing a phantom signal because my "shared" L3 was actually two separate slices.
Write a victim that hammer-writes to a single cache line in a tight loop, using non-temporal instructions to avoid polluting your own cache levels. Then see if your attacker's probe times show a clear bimodal distribution. If you don't get that clean separation between cache hit and miss, your foundation is off.