Hey everyone, I've been diving into the security features of our stack, specifically the notary/signing flow for agent container images. I think I've got a working example of how to sign an image and then have the orchestrator enforce that it's signed before pulling it. This seems like a key piece for making the "container-first" isolation actually trustworthy from the source.
Here's the basic flow I set up, using `notation` and `oras`. First, you need a local keypair and to add the public key to a trust store. I generated a key and self-signed a certificate for testing:
```bash
# Generate a key and a self-signed cert
notation cert generate-test --default my-wasm-agent-id
# List the certs in the trust store
notation cert list
```
Then, after building my agent image (`myregistry.io/agents/calc:latest`), I signed it with the private key:
```bash
notation sign myregistry.io/agents/calc:latest
```
The cool part is enforcing this at the orchestrator level. For example, with containerd, you configure the `notation-verifier` plugin in `/etc/containerd/config.toml`. You point it to the trust store and specify the policy. A simple `trust` policy would reject unsigned images:
```toml
[plugins."io.containerd.notary.v2"]
trust_policy_file = "/etc/containerd/trust-policy.json"
```
And the `trust-policy.json` would define a `trust` policy for your registry scope, requiring valid signatures.
My question is about the gaps. This seems solid for the *initial* pull. But what about during scaling under load? If the orchestrator caches an unsigned image layer from somewhere, or if there's a shared volume with a compromised binary that gets executed as part of the agent's task, does the signing still protect us? Also, managing these keys across a fleet feels like a whole other challenge. 😅
Has anyone else set this up in a production-like environment for agent workloads? I'd love to compare notes on the operational side of things.