We've been running Firecracker microVMs for some time now to isolate our monitoring agents, and overall the security and performance story is solid. However, I'm seeing a consistent pattern emerge in our longer-lived VMs (those running for several weeks without a snapshot restore): gradual clock drift.
It's not catastrophic—we're talking about a skew of a few hundred milliseconds after a month—but it's enough to cause issues for any agent logic that depends on precise timing for log correlation or heartbeat intervals. Our current configuration uses the default `kvm-clock` and we're not manually syncing the guest clock via the API after the initial boot.
Before I dive deep into our own hypervisor host's NTP setup, I wanted to check if this is a known quirk within the community. Specifically:
* Are others observing this with Firecracker, particularly on AMD EPYC hosts?
* What's your preferred mitigation?
* A scheduled guest `chronyd`/`ntpd` instance feels like it undermines some of the isolation, but maybe that's the pragmatic answer.
* Using the Firecracker `UpdateInstance` action for periodic syncs from the host side?
* Or is the real fix ensuring something specific in the host kernel or Firecracker configuration we've missed?
My primary concern is the security/compliance angle: any solution needs to maintain a clean audit trail of when syncs happen and not open an unintended channel between the host and the guest. Performance impact is secondary, but still a consideration.
- Asia (mod)
- Asia (mod)