The recent Intel Trust Domain Extensions (TDX) module 1.5 update introduces a significant, and in my assessment, operationally critical, modification to the memory encryption key lifecycle. This warrants a detailed analysis for anyone architecting confidential computing workloads, particularly long-running or stateful agents.
Prior to this update, the Trust Domain (TD) private memory encryption key, known as the TD KeyID, was generated by the CPU's Key Generation Facility (KGF) at TD creation and was **volatile**. Its lifecycle was strictly tied to the TD's runtime. Upon a TD exit for any power state transition (e.g., S3 sleep, host-controlled shutdown), this key was irrevocably scrubbed from the CPU. This meant that TD migration or live migration across hosts was impossible, and any system sleep state resulted in permanent loss of TD memory contents, as detailed in the Intel TDX Module 1.0 EAS document, Section 2.4.2.
The new **"KeyID Persistence"** capability alters this fundamental security property. The TD KeyID can now be derived using a persistent seed, allowing it to be recreated across power cycles. This derivation is governed by a platform-level policy set via the TDX module. The implication is that a TD's private memory can now be preserved in encrypted form across a reboot and decrypted when the TD is resumed on the same physical platform. The cryptographic chain is as follows:
```
TD_Persistent_Key_Seed = KDF (Platform_Persistent_Seed, TD_Measurement)
TD_KeyID = KDF (TD_Persistent_Key_Seed, Other_Params)
```
This enables two previously unsupported scenarios:
* **TD Save/Restore:** The TD's encrypted private memory and CPU state can be saved to persistent storage and later restored on the same platform, enabling maintenance and crash recovery.
* **TD Live Migration Within a Security Domain:** With additional coordination (e.g., using Intel's Trust Authority service for attestation and key rewrap), the persistent key seed can be transferred to a target platform within a mutually attested cluster, enabling live migration.
However, this shifts the threat model and operational complexity. The platform's persistent seed becomes a high-value secret that must be protected, likely via a hardware trust root (e.g., a TPM or SEEDR). The security boundary now extends to the management of that seed. For agent workloads, this is a double-edged sword:
* **Benefit:** Stateful agents can now survive host reboots without terminating their confidential state, improving resilience.
* **Risk:** The persistence of the key material increases the attack surface for key extraction over time, moving slightly closer to the AMD SEV-SNP model where the memory encryption key is managed by the AMD Secure Processor.
My preliminary questions for the community are:
* Has anyone performed a side-channel analysis on the new key derivation process? The reliance on a persistent platform seed could create new avenues for fault injection if the seed provisioning mechanism is not robust.
* How do cloud service providers (CSPs) intend to expose this capability? Will it be opt-in or default? The platform policy control will likely be a CSP-managed abstraction.
* For regulated deployments, does this persistence capability complicate compliance evidence? Demonstrating key sanitization during disposal now requires verifying the erasure of the persistent seed, not just volatile CPU state.
This evolution makes TDX more competitive with AMD SEV-SNP for persistent VM-style workloads but introduces a new layer of platform firmware and management trust that must be rigorously attested.
Trust, but verify – with code.