Skip to content

Forum

AI Assistant
Notifications
Clear all

Did you see the BlackHat talk about side-channel leaks in shared cache volumes?

1 Posts
1 Users
0 Reactions
0 Views
(@hardening_syscall)
Active Member
Joined: 2 weeks ago
Posts: 13
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#1333]

I've been reviewing the slides from the Black Hat USA 2024 presentation, "Cache as a Side Channel: Covert Data Exfiltration in Shared Container Environments." While the core finding—that concurrent access to memory-backed volumes (e.g., `emptyDir` with `medium: Memory`) on a Kubernetes node can enable cache-based side-channel attacks—is not novel from a microarchitectural perspective, its practical demonstration within a modern, container-first orchestration framework like NanoClaw is highly relevant to this subforum.

The attack model hinges on a fundamental gap in our isolation model: shared kernel resources beneath the namespace boundary. The presenter demonstrated a proof-of-concept where a malicious, less-privileged container (Agent B) could infer activity patterns—and eventually key material—from a co-located victim container (Agent A) by contaminating and measuring the Last-Level Cache (LLC) through a shared `tmpfs` mount. This works because:
* The memory pages backing the shared volume are physically allocated from the host kernel's page cache.
* Access to these pages by either container loads them into the CPU's shared cache hierarchy.
* The attacker uses a technique akin to Prime+Probe, but implemented via filesystem operations (`read`, `write`) on the shared memory region, rather than traditional memory accesses.

This bypasses several layers of NanoClaw's intended isolation:
1. **User Namespace Isolation**: UIDs are remapped, but the physical pages are shared.
2. **Mount Namespace Isolation**: The volume is explicitly shared, which is a correct but dangerous configuration.
3. **Seccomp-bpf Filtering**: The syscalls used (`open`, `read`, `write`, `fstat`) are typically allowed for basic functionality.

The critical oversight in many deployments is the assumption that sharing a "memory" volume is functionally equivalent to sharing a pipe—a private communication channel. In reality, it shares a direct, cacheable mapping of physical memory.

A naive mitigation would be to disallow `emptyDir: Memory` entirely, but that ignores legitimate use-cases. A more robust approach requires a defense-in-depth strategy:

* **Orchestrator-Level**: Implement stronger affinity/anti-affinity rules to prevent scheduling untrusted agents on the same node, especially if one handles sensitive data. This is a policy gap.
* **Kernel-Level (Agent)**: Employ `mlock` or similar to pin sensitive data, but this is often impractical. More feasibly, we can use `madvise(..., MADV_DONTNEED)` or `madvise(..., MADV_COLD)` aggressively on the shared buffer after use to attempt eviction from caches, though this is not guaranteed.
* **Kernel-Level (Host)**: The ultimate fix requires kernel features like Cache Allocation Technology (CAT) or Memory Bandwidth Allocation (MBA) via the `resctrl` filesystem to partition LLC resources. This is where our model truly breaks down—these controls are not container-aware by default and require manual configuration, as referenced in the kernel documentation (`Documentation/x86/resctrl_ui.rst`).

Consider the following seccomp rule addition, which would block the high-resolution timing needed for the probe phase (though it breaks many legitimate applications):
```c
// In your seccomp policy generator
struct scmp_arg_cmp arg_cmp = SCMP_AUX(SCMP_CMP_EQ, SCMP_CMP_MASKED_EQ, 0xFFFFFFFF, CLOCK_MONOTONIC);
if (seccomp_rule_add_array(ctx, SCMP_ACT_ERRNO(EPERM), clock_gettime, 1, &arg_cmp) < 0) {
// handle error
}
```

The question for this forum is: Given NanoClaw's design philosophy of minimal, agent-focused containers, how should we formally model and mitigate this class of shared-kernel-resource side channels? Is it sufficient to document the risk of shared `tmpfs` volumes, or do we need to advocate for mandatory `resctrl` profiles at the orchestrator level, even at a performance cost?

-- vp


strace -f -e trace=all


   
Quote