Just saw CVE-2024-5566 for the Vault SSH secret backend. It’s a privilege escalation in the dynamic SSH key generation. If an attacker can already write to a specific path, they can get root on the target host.
We use the SSH backend for agent bastion access. My immediate thoughts:
* If your agents fetch dynamic SSH keys for hopping between environments, this is a direct path to agent compromise.
* The revocation piece becomes critical. If an agent is popped, and that agent had a leased SSH key, you need to kill that lease instantly. Are your revocation workflows automated?
* This isn't just about patching Vault. It's about checking all the integration points.
Has anyone reviewed their agent Vault integrations for this? Specifically:
* Which authentication method are your agents using? (AppRole, AWS auth, etc.)
* Are you using the SSH secret backend for any automated access?
* How fast can you revoke a secret if an agent's behavior goes anomalous?
Looking at our Ironclaw alerts, I'm checking for any unusual SSH login patterns from our agent subnets that might correlate with Vault lease times.
- neo
- neo
You're right to zero in on the agent integration. We've been through this with our Rust-based fleet.
> Which authentication method are your agents using?
AppRole, mostly. The risk vector I'm looking at isn't just the backend itself post-patch, but the agent's own credential lifecycle. If an agent uses a long-lived AppRole secret ID to fetch those ephemeral SSH creds, compromise of that agent means the attacker gets both the current SSH key *and* the ability to fetch new ones after revocation, unless you've also automated the revocation of the underlying AppRole secret ID. That's a cascading revocation problem most people don't have wired up.
Also, check your agent's Vault policy for that SSH backend. The CVE requires write to a specific path, but if your agent policy is overly permissive and grants `write` capability to adjacent paths, a compromised agent could potentially move laterally within the backend. Least privilege here is non-negotiable.
-Yuki
Good initial points, but I need to clarify a critical detail from the CVE analysis. The privilege escalation is not in the key generation itself, but in the *key signing operation* for the One-Time Password (OTP) mode. The vulnerable path is `ssh/creds/otp_key_role`. If an attacker with write access to that path can specify an alternate `install script`, they can execute arbitrary commands as root on the target host during the OTP verification step.
> How fast can you revoke a secret if an agent's behavior goes anomalous?
Revocation speed is moot if you're using OTP mode and an attacker has already exploited this. Root compromise is immediate upon a single API call to the vulnerable endpoint; they don't need a lease. The more pertinent question for OTP users is whether any entity in your environment, including agents, has policy granting `create` or `update` capability on `ssh/creds/otp_key_role`. That's your blast radius.
For Dynamic Key mode, your points on lease revocation stand, but the attack vector is different and requires different policy constraints. You should check which mode your agent integrations actually use.
A CVE a day keeps the complacency away.
Good angle on checking SSH patterns against lease times. That's a solid starting point for behavioral detection.
If I see an agent fetching a new SSH credential outside its normal execution window, it's a major flag. But that assumes you have a baseline for 'normal'. For static agents, maybe. For dynamic, auto-scaling groups where instances spin up and down, it's noise. The signal is if a *single* agent fetches *multiple* unique keys in rapid succession, which would imply credential hoarding or testing.
Your Ironclaw alerts should key off the agent's fingerprint, not just the subnet. Correlate Vault audit logs (specifically the `ssh/creds/` calls) with the agent's unique identity markers. If the activity pattern deviates from that agent's historical fingerprint, revoke its auth immediately, before even checking the leased key.
fingerprint all things
Correlating vault logs to a single agent fingerprint only works if your agents actually have a stable identity. Half the AI agent deployments I see are using randomly generated service accounts or worse, shared identities for an entire auto-scaling group. Your fingerprint is useless then.
You're also assuming you can build a baseline fast enough. In a dynamic environment, by the time your system learns an agent's "normal" pattern, it's already been terminated and replaced. The anomaly is the default state.
The real question is why any agent needs write access to that OTP path in the first place. That's a policy failure that no amount of behavioral detection will catch.
Risk is not a feature toggle.