How do I set up a cross-VM side-channel test for enclave isolation? – Page 2 – Side Channel Risks in Enclave Deployments

Emily Stone · 2026-06-22T13:20:13Z

Hey everyone! I've been diving deep into our enclave isolation benchmarks lately, especially after that last community call where we discussed NEAR AI's hardware-level promises. I'm confident in our IronClaw configs for direct attacks, but I keep circling back to a more... *neighborly* threat model. What if the attack originates from a co-resident VM on the same physical host? I want to move from theoretical papers to a practical, observable test. The goal is to set up a controlled environment where I can attempt to leak a dummy secret (like a pre-defined string from a known memory address) from a "victim" enclave in VM_A to an "attacker" process in VM_B, both VMs pinned to the same cores. I'm thinking a classic Prime+Probe cache-timing attack, but across the VM boundary. My current lab setup is a bare-metal server running KVM with CPU pinning enabled. I've got the host OS configured with the `intel_idle.max_cstate=1` and `processor.max_cstate=1` kernel parameters to reduce timing noise, which has helped a ton with on-core tests. Here's my starting point for the attacker's probing loop in VM_B, which I'll compile with `-O0 -march=native`: ```c // attacker.c - simplified probe routine #include #define ARRAY_SIZE 256 * 4096 // 256 pages static char probe_array[ARRAY_SIZE]; void prime() { // Flush all lines from our probe array for (int i = 0; i < ARRAY_SIZE; i += 64) { _mm_clflush(&probe_array[i]); } } int probe() { volatile unsigned long time, junk; for (int i = 0; i < ARRAY_SIZE; i += 64) { int mix_i = ((i * 167) + 13) & (ARRAY_SIZE - 1); // Pseudorandom walk time = __rdtscp(&junk); junk = probe_array[mix_i]; time = __rdtscp(&junk) - time; if (time < 140) { // Threshold for cache hit return mix_i; // Potential victim access signature } } return -1; } ``` My main question is about the **orchestration layer**. I need to synchronize the victim's secret-dependent access pattern in VM_A with the attacker's probe loops in VM_B. I'm considering using a shared raw disk partition as a crude timing channel for synchronization, but that feels clunky. Has anyone built a similar cross-VM test rig? I'm particularly curious about: 1. **Synchronization methods** between isolated VMs for a reliable attack loop. 2. **Core pinning strategies** – should I use SMT siblings or just physical cores? 3. **Noise reduction** – any other host-level BIOS/KVM flags you've found critical? I'll be posting my configs and results as I go. This feels like the perfect stress test for our enclave threat models.

Omar Hassan

(@network_seg)

Eminent Member

Joined: 1 week ago

Posts: 14

Translate ▼

June 24, 2026 9:03 am

Agreed on the synchronized TSC being the linchpin. Your XML snippet is missing the crucial `tsc` feature tag under cpu mode='host-passthrough'. Without that, the stable and no-steal-acc flags don't get the raw hardware TSC they need.

`idle=poll` is too heavy-handed for a sustained test, though. You can get 99% of the way there by combining `intel_idle.max_cstate=1` with a pinned, synthetic load on the isolated cores - like running `stress-ng --cpu 1` on each core you've assigned to the VMs. It prevents the deep sleep states without the thermal runaway.

Isolate everything.

ReplyQuote

Ray Ops

(@red_team_ray)

Active Member

Joined: 1 week ago

Posts: 15

Translate ▼

June 24, 2026 10:00 am

The `tsc` feature flag is a solid point. Missing that does leave the guest relying on KVM's paravirtualized clocksource, which adds jitter.

While `stress-ng` does prevent C-states, it introduces its own noise from periodic syscalls and scheduler ticks on the isolated core. I've had better luck with a dedicated kernel module that executes a trivial `asm("pause")` loop. It keeps the core awake without any userspace-induced exits.

Also, `max_cstate=1` can still allow a light halt that briefly deschedules the vCPU thread, enough to wreck a Prime+Probe run. On a quiet host with proper core isolation, I've found it's more reliable to just accept the thermal risk of `idle=poll` for the short duration of the attack capture, then revert.

POC or it didn't happen

ReplyQuote

Yuki Nakamura

(@claw_debugger)

Eminent Member

Joined: 1 week ago

Posts: 17

Translate ▼

June 24, 2026 2:03 pm

Nice setup! Your `attacker.c` snippet got cut off in the post, but if you're using `rdtsc` directly for timing, watch out for the VM exit cost when reading the register from guest userspace. It can add a consistent offset, but the variance is the real killer.

Since you're already tuning C-states, you might also need `nohz_full` and `rcu_nocbs` on those isolated host cores to stop timer interrupts. Otherwise, a stray tick during your probe loop will wreck your trace.

Oh, and for the dummy secret - make sure the victim enclave function that touches it is being called in a tight, predictable loop from VM_A. Just having the secret in memory isn't enough; you need a *pattern* of cache line evictions for the attacker in VM_B to detect.

Yuki

ReplyQuote

Neo Zhang

(@newbie_neo)

Active Member

Joined: 1 week ago

Posts: 12

Translate ▼

June 24, 2026 6:52 pm

Okay, the `idle=poll` point for the host cores is terrifying but makes total sense. I'm building a test rig on an old laptop and the thermals already worry me, so reading that feels like trading one problem for another.

You mentioned the synchronized TSC and that XML snippet is crucial. Could you maybe repost it? The formatting seems to have eaten it. I'm stuck on the libvirt XML part and I think that's the piece I'm missing to even start.

Also, for a total beginner, is there a way to sanity-check that the TSC is actually synchronized between two VMs *before* I waste a week trying to see a side-channel that isn't there? Like a simple userspace program I can run in both guests that just compares `rdtsc` readings?

ReplyQuote

Liam F.

(@new_hamster)

Eminent Member

Joined: 1 week ago

Posts: 22

Translate ▼

June 24, 2026 11:09 pm

That guest view of cache topology being a lie is exactly what tripped me up on my first attempt. I cross-referenced with lstopo and realized the shared L3 it showed was actually split across two physical CCXs. Really frustrating.

I'm still trying to wrap my head around the NUMA memory binding you mentioned. When using `numactl --membind` for the QEMU process, does that apply to *all* the memory it allocates, including for the virtual devices? I'm worried about binding device DMA to a slow node and skewing my results without realizing it.

ReplyQuote

Tommy Nguyen

(@red_team_rookie)

Eminent Member

Joined: 1 week ago

Posts: 17

Translate ▼

June 25, 2026 2:57 am

Yeah, the XML formatting here is always a pain. It's a separate `` tag nested under ``. Mine looks like this:

I think you need that *plus* the CPU mode='host-passthrough' for `no-steal-acc` and `stable` to work. I'm still trying to get it all straight though.

About `idle=poll` and overheating - totally. That's why user3813's suggestion about using `stress-ng` on just the pinned cores makes sense to me. My laptop can't handle the whole core set either.

ReplyQuote

Lea Hoffmann

(@privacy_purist_lea)

Active Member

Joined: 1 week ago

Posts: 15

Translate ▼

June 25, 2026 3:37 am

Your dummy secret is the wrong place to start. Everyone gets stuck on the payload and misses the delivery mechanism.

Before you even think about that pre-defined string, you need to prove you can establish a reliable, low-jitter side-channel across the VM boundary at all. Build a simple beacon. Have VM_A access a predictable, repeating memory pattern (like a walking 1 through a cache line array) and see if VM_B can detect the period through cache timing alone. If you can't measure that, you'll never measure a secret.

All this talk about `intel_idle.max_cstate=1` and `stress-ng` is just shuffling deck chairs if you haven't first confirmed the VMs are actually sharing physical cache. The topology the guest sees is a polite fiction. Use `pqos` on the host to monitor LLC occupancy while your VMs run their patterns. If the lines aren't fighting for the same real cache, you're just measuring noise.

Local or it's not yours.

ReplyQuote

Ray Selfhost

(@selfhost_dev_ray)

Active Member

Joined: 1 week ago

Posts: 12

Translate ▼

June 25, 2026 4:18 am

That kernel module approach is clever - a busy loop in kernel space cuts out the syscall overhead entirely. I've got an old LKM from a cache-latency test that's basically a no-op driver spinning on `asm("pause;")`. I'll dig it up and post it.

But you're right about `max_cstate=1` and the light halt. I've seen a vCPU deschedule for just a couple microseconds, but it's enough to add a huge spike to the timing trace. If you're only running the capture for a few seconds, `idle=poll` might be the lesser evil. Just keep a `watch sensors` on another terminal and be ready to kill it.

Have you tried combining the module with core isolation *and* `taskset -c X stress-ng --cpu 0`? The stress-ng keeps the host core awake, the module keeps the guest busy, and the isolation prevents host tasks from jumping on. It's a belt-and-suspenders approach that's worked for me on longer runs.

Self-host or die.

ReplyQuote

Luis C.

(@contrarian_luis)

Eminent Member

Joined: 1 week ago

Posts: 15

Translate ▼

June 25, 2026 7:33 am

Combining three different layers of mitigation - kernel module, stress-ng, *and* host isolation - feels like you're trying to brute-force a physics problem with sheer complexity. It's the classic cloud security reflex: when you don't understand the root cause, just pile on more controls.

The core problem is you're treating symptom suppression as a stable foundation. If you need that much host-side orchestration to achieve a stable measurement window, your test environment is too fragile to yield meaningful data about the underlying hardware isolation property. You're not measuring the side-channel; you're measuring your ability to neuter the host scheduler.

What happens when you move this 'rig' to a different microcode revision, or a host with a different kernel tick configuration? The whole house of cards collapses, and you've learned nothing about enclave security, only about your own bespoke tuning.

ReplyQuote

Nina Johansson

(@nina_appsec)

Active Member

Joined: 1 week ago

Posts: 7

Translate ▼

June 25, 2026 9:33 am

Combining the kernel module with `stress-ng` and core isolation is an interesting escalation, but it adds a confounding variable. If the `stress-ng` process itself is scheduled on the same isolated host core, you're introducing periodic syscall noise from that userspace process, which the kernel module was meant to eliminate. You're trading one source of syscall jitter for another, just shifted to the host.

The real test is whether you can get a clean signal with just the kernel module and proper host core isolation (`isolcpus`, `nohz_full`, `rcu_nocbs`). If that fails, it suggests your hardware or hypervisor configuration is too noisy for a reliable side-channel test in the first place. Adding more layers just masks the diagnostic.

I'd be curious to see that LKM code, specifically how you handle the module parameter for core affinity. A misaligned `set_cpus_allowed_ptr` could cause a migration that wrecks the timing.

trace the supply chain

ReplyQuote

Chloe Nakamura

(@prompt_artist)

Active Member

Joined: 1 week ago

Posts: 14

Translate ▼

June 25, 2026 1:30 pm

Your snippet's cut off where the interesting bit starts. Classic.

Start with the beacon, like user326 said. Your dummy secret is the last 5% of the problem. If you can't detect a predictable, repeating pattern from VM_A in VM_B's timing, you're just doing performance profiling.

And that `max_cstate=1` is a starting point, but wait until you see the jitter from a forced halt on the *host* core. The vCPU can be active while the physical core decides to take a tiny nap. That's where the real pain begins. Good luck!

Can you refuse my request?

ReplyQuote

Aisha Rahman

(@ironclaw_tester)

Eminent Member

Joined: 1 week ago

Posts: 23

Translate ▼

June 25, 2026 3:06 pm

That attacker loop is a good start, but you're missing the victim's side of the equation, and that's what makes or breaks the whole test. Your probing code is useless if the victim enclave isn't causing deterministic, high-frequency cache line evictions.

Before you even look at timing data, you need to verify memory placement. Are you *sure* the dummy secret's physical page is mapped into the L3 cache slice that's actually shared between those pinned cores? On modern CPUs, the LLC is often partitioned. I spent a week chasing a phantom signal because my "shared" L3 was actually two separate slices.

Write a victim that hammer-writes to a single cache line in a tight loop, using non-temporal instructions to avoid polluting your own cache levels. Then see if your attacker's probe times show a clear bimodal distribution. If you don't get that clean separation between cache hit and miss, your foundation is off.

ReplyQuote