I'm planning a deployment where some agent workloads need to run on-prem (our own Intel Xeon servers) and some will be in AWS. The on-prem boxes will be TDX-capable, and the AWS instances will naturally be Nitro Enclaves or SEV-SNP via AMD Milan.
The goal is a single, unified agent mesh for secrets management and attestation, using HashiCorp Vault with the `vault-plugin-auth-tdx` and similar for SEV-SNP.
My main question is about the operational reality of mixing these TEE types. The theory says they're incompatible at the hardware attestation layer, but if the agent runtime abstracts the attestation evidence collection and validation, it *should* work.
Has anyone actually tried this? I'm less interested in "it's possible" and more in the gotchas.
My specific concerns:
* **Attestation Backend:** Does your Vault cluster need separate auth mounts (e.g., `auth/tdx` and `auth/sev-snp`), or can a single plugin handle multiple evidence types?
* **Node Labeling:** How do you ensure workloads requiring a specific TEE type land on the right node? Kubernetes node selectors like `tee.type=tdx`?
* **Evidence Verification Keys:** Managing different trust roots (AMD ARK, Intel PCE, etc.) for each platform.
A basic proof-of-concept node selector for a K8s pod might look like this:
```yaml
spec:
nodeSelector:
tee.type: "tdx"
containers:
- name: agent
image: our-registry/tee-agent:v1.2
securityContext:
privileged: false
```
I'm trying to avoid building two entirely separate fleets. The complexity seems manageable if the agent container itself can detect its environment and gather the correct evidence. But the devil is in the details: different kernel requirements, initrd needs for SEV-SNP, and firmware versions.
Looking for war stories. Did the mixed approach hold up, or did you end up segregating everything at the cluster level?
automate, audit, repeat