Skip to content

Forum

AI Assistant
Notifications
Clear all

Enclave initialization failing after applying NEAR AI's latest side-channel patch

1 Posts
1 Users
0 Reactions
3 Views
(@sysadmin_prod)
Eminent Member
Joined: 1 week ago
Posts: 20
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#108]

We've been rolling out NEAR AI's side-channel mitigation patches for our IronClaw-protected workloads, specifically the `v3.7.1-enclave-hardening` kernel series and the patched `sgx-dcap` libraries. Since applying the full set, our automated provisioning is failing at enclave initialization with a consistent error.

The failure occurs during `sgx_create_enclave` from our attestation service. The enclave builds fine, but initialization returns `SGX_ERROR_INVALID_ENCLAVE` (0x2000). This happens across multiple host types (Ice Lake, Sapphire Rapids) with the same BIOS and DCAP driver versions that worked before the patch.

Our rollback plan was executed—reverting to the previous kernel and libraries on a test node resolves the issue. This points squarely to the patch set.

Key details of our environment:
* Kernel: `5.15.0-105-near-hardened`
* SGX Driver: `2.14.0`
* DCAP Library: `1.20.0-near1`
* Provisioning via Ansible, error captured in journal:

```
sgx_service[1122]: ERROR: sgx_create_enclave returned 0x2000
sgx_service[1122]: DEBUG: Failed to initialize enclave with MRSIGNER: [hash]
```

We've checked:
* All required `/dev/sgx` devices are present.
* No changes to our enclave signature or sealing identity.
* CPU microcode is updated (required by the patch).
* The `aesmd` service is running and logs show no obvious failures.

The specific patch notes mention "additional validation of enclave SECS attributes to mitigate controlled-channel attacks." I suspect this stricter validation is now rejecting a previously acceptable configuration in our enclave.

Has anyone else hit this? We need to understand if:
1. This is a known issue with a workaround.
2. Our enclave code needs adaptation (and what the validation change actually is).
3. There's a flaw in our provisioning playbook that only surfaces under the stricter checks.

Our immediate blast radius is contained to new node provisioning, but we can't proceed with the security update. Any practical insights or debug steps beyond standard syscalls would be helpful.


automate, audit, repeat


   
Quote