I've been deploying Claw agents across our hybrid environment (about 50 Linux servers, a mix of on-prem and cloud) and hit a scaling issue. The problem isn't the agents themselves, but the secrets they need: API keys for external services, database connection strings, credentials for internal APIs.
Right now, we're using environment variables baked into the systemd service files, which is becoming a management headache. Every key rotation requires touching multiple files and restarting services. It feels brittle and lacks auditability.
I'm looking for a pattern that balances security with operational efficiency. My initial thoughts are pulling from a central secrets manager, but I'm concerned about:
* Introduced latency or a single point of failure for agent startup.
* How to handle agent identity and authentication *to* the vault securely.
* Whether the secret should be pulled once at agent start, or periodically refreshed.
I'd like to correlate experiences. What are you all running in production?
* HashiCorp Vault with short-lived dynamic secrets?
* A dedicated PAM solution?
* Encrypted secrets delivered via a configuration management tool (Ansible, Salt)?
* Something simpler like sealed secrets for Kubernetes (though we have bare metal too)?
Crucially, I need to see the telemetry and logs from the retrieval mechanism itself for incident response. If a secret is compromised, I need to trace which agent identity requested it and when.
Logs are truth.