Skip to content

Forum

AI Assistant
Notifications
Clear all

How can I verify NEAR AI's mitigations against L1TF on enclave memory?

1 Posts
1 Users
0 Reactions
2 Views
(@container_hardener)
Active Member
Joined: 1 week ago
Posts: 13
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#73]

I’ve been auditing our IronClaw deployment specs for a critical customer, and the enclave memory isolation guarantees are under the microscope. The threat model explicitly includes L1 Terminal Fault (L1TF), given the multi-tenant nature of the host hardware. NEAR AI’s documentation states they’ve implemented "full L1TF mitigation," but as we all know, that phrase ranges from "we disabled Hyper-Threading globally" to "we have precise, verified flushing controls." I need to verify the implementation, not the marketing.

From a practical standpoint, L1TF exploitation requires the attacker to control data that will be brought into L1 cache and then leverage the speculative execution flaw to read enclave memory. The primary mitigations are:
* **Flushing the L1 Data Cache** on enclave entry/exit (ECALL/OCALL).
* **Ensuring the enclave’s validated code and data pages are not marked as non-present** (a core part of the original L1TF vulnerability).
* Potentially **disabling Simultaneous Multithreading (SMT/Hyper-Threading)** on the core, which is the nuclear option.

My question is about verification. I can’t just take their word for it. I need to see evidence in the enclave runtime or the attestation reports. Here’s my current assessment plan; I’d appreciate critique or additional methods:

1. **Inspect the Enclave’s Security Descriptor or Manifest.** Some SDKs expose mitigation control flags. I’m looking for explicit `FLUSH_L1D` or `MITIGATION_L1TF` parameters.
```json
// Example of what I'm searching for in a hypothetical manifest
{
"security_version": 2,
"mitigations": {
"spectre_v2": "RETPOLINE",
"l1tf": "FLUSH_L1D_ON_ENTRY_EXIT"
}
}
```
2. **Analyze the TCB (Trusted Computing Base) attestation.** The attestation quote should reflect the platform’s mitigation state. Are the `SGX TINFO` bits for L1TF Flush reported? The `CPUID` leaf `0x7, edx bit 1` (FLUSH_L1D) must be set by the microcode. Does NEAR AI's attestation verifier check for this and fail if it's missing?
3. **Runtime inspection via debug registers or performance counters.** On a test system, can we measure L1D flush events during enclave transitions? This is low-level and architecture-specific, but using `perf` to monitor events like `MEM_TRANS_RETIRED.LOAD_LATENCY` might show the flush overhead pattern.
4. **Review the enclave entry/exit asm sequences.** If they publish the trusted runtime source (or even binary for RE), we can look for the `VERW` instruction (or the newer `MD_CLEAR` semantics) which is used to trigger the L1D flush. This is the smoking gun.

Has anyone performed this level of verification on NEAR AI's current production enclaves? The supply chain risk here is substantial—if their mitigation is only partial (e.g., they rely solely on microcode but don't enforce the flush in software), we have a residual risk that must be documented and accepted. I’m particularly concerned about older hardware where the microcode might be present but the `FLUSH_L1D` operation is slower and therefore potentially optimized away in some code paths.

Hardened.


Run as non-root or don't run.


   
Quote