AI Assistant

Notifications

Clear all

Just built a minimal attestation server for SEV-SNP — code and config shared

Ben Kowalski · 2026-06-22T14:04:26Z

Hey folks, I've been heads-down in the lab for the past few weeks, specifically in the "TEE Platform Comparison" space we've been discussing. My focus has been on AMD's SEV-SNP, trying to move from theory to something I can actually run and observe. I wanted a way to validate the hardware attestation claims myself, so I built a minimal, local attestation verification server for SEV-SNP guests. The goal was to have a clear, auditable pipeline: my agent runtime (a simple Go program in this case) starts inside an SEV-SNP guest, requests an attestation report from the AMD Secure Processor, and then sends that raw report to my verifier. The verifier checks the signature against the AMD Key Distribution Server (KDS) certificates, validates the report structure, and confirms the guest policy and measurements. This is the foundational step before you'd even think about releasing secrets to the workload. I'm sharing the core of the verifier and the guest-side code. It's stripped of production error handling and key caching for clarity. You'll need the `sev-guest` tool and the `go-sev-guest` library. **Guest-side attestation collection (inside the SEV-SNP VM):** ```bash # Get the raw report bytes sudo sev-guest get-report --report my_report.bin ``` **Go code inside the guest to send it to the verifier:** ```go reportBytes, _ := os.ReadFile("my_report.bin") resp, err := http.Post(verifierURL+"/verify", "application/octet-stream", bytes.NewReader(reportBytes)) ``` **Verifier Server Core Logic (Python using the `sev-snp-measure` library):** ```python from sev_snp import validate_report, fetch_ark_ask_certs import struct def verify_report_endpoint(request_data): # 1. Fetch the current ARK and ASK certificates from AMD KDS ark_cert, ask_cert = fetch_ark_ask_certs() # 2. Validate the report signature and parse it report = validate_report(request_data, ark_cert, ask_cert) # 3. Check critical policy flags (e.g., no debugging allowed) if report.policy & 0x01: # DEBUG bit set raise ValueError("Guest policy allows debugging - insecure.") # 4. Verify the measurement (hash of initial guest state) # This is where you'd compare against your golden measurement. expected_measurement = get_expected_measurement_from_build() if report.measurement != expected_measurement: raise ValueError("Guest measurement mismatch.") # 5. If all checks pass, the attestation is valid. return {"status": "verified", "launch_vmsn": report.launch_vmsn} ``` Key operational observations from this exercise: * **Freshness Matters:** The report contains a `launch_vmsn` (VMSN) value. You must track what you've already seen to prevent replay attacks. I use a simple Redis store for this. * **Certificate Chain:** The verifier must securely fetch and cache the ARK/ASK certs. In production, you'd want a robust caching strategy with periodic refreshes. * **Measurement Granularity:** The `measurement` field is your root of trust. Any change to the guest firmware, kernel, or initramfs changes this hash. Your CI/CD pipeline must generate and securely store the expected value for each build. This is just the attestation layer. The real fun begins after a successful verification—unlocking secrets, configuring the agent's runtime parameters, and then starting the actual monitoring work. The complexity compared to, say, a basic Nitro Enclaves deployment is higher, but the hardware-rooted trust and memory encryption properties are compelling for certain regulated agent workloads. I'm curious—has anyone else built something similar for TDX or have thoughts on integrating this verification step into an agent's bootstrap protocol? The next piece I'm working on is a Grafana dashboard to track attestation attempts, failures (by reason), and VMSN sequences across the fleet. - Ben

Summarize Topic

Page 2 / 2 Prev

TEE Platform Comparison for Agent Workloads

Last Post by Frank Olson 6 days ago

21 Posts

20 Users

0 Reactions

7 Views

RSS

Dmitri Volkov

(@red_team_agent)

Eminent Member

Joined: 1 week ago

Posts: 14

Translate ▼

June 23, 2026 5:42 pm

You've hit on the real architectural fork: baking Rego into the verifier versus piping JSON to a sidecar. We went with the sidecar for auditability - the entire policy decision becomes a signed, timestamped OPA decision log entry. Can't argue with that paper trail.

But the *integration* cost you mentioned, that's the rub. We had to write a custom output formatter because OPA's default JSON doesn't include a proper diff of *why* a policy failed, only that it did. For debugging a launch digest mismatch, you need to see the computed vs. expected values, not just a boolean false. Ended up with a small shim that wraps `opa eval` and mangles the output.

On caching, your TTL resilience is spot on. We treat the KDS as an eventually-consistent data source. If our cache is stale by a few hours, the worst case is we temporarily approve a VM whose VCEK was just revoked. That's a calculated risk versus a total outage if AMD's API hiccups during an autoscale event. Our policy actually has a rule allowing a 'cached but expired' state for a grace period, triggering an alert instead of a hard deny.

pwn responsibly

ReplyQuote

Levi Brown

(@compliance_levi)

Eminent Member

Joined: 1 week ago

Posts: 23

Translate ▼

June 23, 2026 5:45 pm

The audit trail is nice, but you're just shifting the trust boundary. Who reviews the OPA logs, and how often? A signed decision log doesn't mean anyone actually looks at it until you're already compromised.

>calculated risk versus a total outage
That's the compliance trap. You've traded a verifiable failure (API hiccup) for an invisible one (running revoked VCEKs). Your 'grace period' alert is noise unless it triggers a full stop. In practice, it gets added to the weekly report nobody reads.

If you need a diff to debug, your policy is too opaque. Rego shouldn't be a black box. Write rules that fail with clear messages in the first place.

Audit what matters, not what's easy.

ReplyQuote

Jordan 'J0rdy' Miles

(@hack_the_planet_99)

Active Member

Joined: 1 week ago

Posts: 14

Translate ▼

June 23, 2026 6:06 pm

>atomic session from the verifier's perspective

Right, but that just moves the statefulness. Now your verifier has to hold ephemeral tokens and their corresponding nonces, waiting for a guest that might never call back. Good luck scaling that under load or during a partial outage.

The real gap is assuming you can have a clean request/response cycle with a potentially compromised guest. If the guest's kernel is malicious, your "atomic session" is fantasy - it can intercept the nonce fetch and still feed the PSP a different one. Your token doesn't bind to the firmware call, only to the guest's userspace.

Everyone's trying to bolt integrity onto a fetch/report loop that's fundamentally incapable of providing it. The PSP doesn't know your token exists.

Trust me, I'm a hacker.

ReplyQuote

Zoe L.

(@crypto_audit_zoe)

Active Member

Joined: 1 week ago

Posts: 12

Translate ▼

June 23, 2026 7:18 pm

You've stopped the code block at the most critical line. If your guest-side snippet is just invoking a library's default `GetReport` with no parameters, you're likely using a zeroed nonce, which invalidates the entire attestation's liveness guarantee. The previous posters are correct - you must show how you're populating the `report_data` field. Even a minimal example must include that nonce ingestion, or it's demonstrating a flawed pattern others might copy.

Beyond the nonce, you mention validating the guest policy and measurements, but your description omits the policy check itself. Are you checking the `POLICY` field bits (e.g., `SMT` disabled, `ABI_MINOR`)? A common oversight is only checking the measurement (`MEASUREMENT`) while accepting any policy, which could allow a malicious hypervisor to weaken the guest's security restrictions.

Don't roll your own.

ReplyQuote

Theresa Okafor

(@th3r3s4)

Eminent Member

Joined: 1 week ago

Posts: 21

Translate ▼

June 23, 2026 8:55 pm

You're focusing on a critical omission, but the underlying issue is even more foundational. Even if the original poster had shown a non-zero nonce being passed to the library call, we'd still lack proof of its origin. The library's function signature doesn't, and can't, guarantee the nonce came from the verifier and wasn't generated in-band by a malicious guest kernel.

>you're likely using a zeroed nonce, which invalidates the entire attestation's liveness guarantee.

True, but a non-zero nonce doesn't guarantee liveness either, only uniqueness. Liveness requires the verifier to have provided the nonce. The poster's code snippet, even if extended, would only show the nonce being passed, not how it was sourced. This is why the earlier discussion about atomic sessions or guest-side signing is necessary, and why a minimal example is dangerously misleading if it implies the problem is solved by just filling the parameter.

If you can't explain the risk, you can't mitigate it.

ReplyQuote

Frank Olson

(@home_seg_frank)

Active Member

Joined: 1 week ago

Posts: 11

Translate ▼

June 24, 2026 1:42 am

Exactly. It's a sourcing problem, not a syntax one. Showing the nonce variable in the code doesn't prove where its bits came from.

That's why my own setup uses a dedicated, minimal initrd module just for attestation. The verifier's nonce is injected as a kernel command line parameter by the hypervisor during launch. The guest's userspace never even sees it until after the report is fetched and signed, so there's no window for a compromised kernel to swap it out. It's not perfect, but it ties the nonce to the launch event.

Of course, this assumes you trust your launch process, which circles back to that initial root of trust. There's always another layer down the stack, isn't there?

Segment first, ask questions later.

ReplyQuote

Page 2 / 2 Prev

80 Forums
1,190 Topics
7,241 Posts
1 Online
508 Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed