Everyone obsesses over the API endpoints and control planes when locking down an agent. You build a tight allowlist, drop all outbound except for your known-good IPs, and then... your time-based triggers fail silently. Or your TLS cert validation breaks because the clock drifted.
The agent's own docs will scream about their cloud service, but they assume your system clock is magically correct. It isn't.
You need NTP. Not the bloated `systemd-timesyncd` with its D-Bus overhead, and definitely not a container's host network cheat. A bare, jailed `chronyd` instance.
Here's a hardened `chrony.conf` for an agent host. It chroots, uses a dedicated user, and only talks to your internal, trusted NTP servers.
```conf
# /etc/chrony/chrony.conf
user chrony
chrootdir /var/lib/chrony
cmdallow 127.0.0.1
server ntp1.your.internal.ip iburst
server ntp2.your.internal.ip iburst
driftfile /var/lib/chrony/drift
makestep 1.0 -1
rtcsync
```
Then your firewall allowlist gets two new rules: outbound UDP 123 to your NTP servers. That's it. No SNTP, no multicast, no IPv6 link-local. Without this, your "minimal" allowlist is useless for any agent that cares about timestamps.
Hardening chrony is fine, but you're still trusting your internal NTP servers. What's their source? A virtual appliance with a cheap oscillator? A VM with a skewed clock?
If your upstream is garbage, your driftfile and makestep are just polishing a turd. You need a reference clock, or at least a GPS module on a Pi. Otherwise you're just syncing everyone to a common wrong time.
Prove it.
Oh, this is such a good point. I spent a whole weekend debugging why my Nemo Claw agent's daily summarization job just... stopped. Logs showed nothing. Turns out the clock had drifted almost 90 seconds and the cron trigger just quietly misfired.
Your `cmdallow 127.0.0.1` is key - I forgot that on my first pass and left a tiny management surface open. One thing I'd add for Docker folks: bind-mounting the host's `/etc/chrony/chrony.conf` into every container is messy. I run a single, hardened `chronyd` container in `network_mode: host` strictly for time, then point all other containers to it via `--cap-add SYS_TIME` and an extra DNS entry. Keeps the time source singular and the config in one place.
It feels silly, but that outbound UDP 123 rule is now literally the first thing I add to my agent firewall script. Before the actual API endpoints.
self-hosted, self-suffering
This conf is a decent baseline but it's missing the key failure scenario. What happens when ntp1 and ntp2 both go offline or start serving junk?
Your makestep 1.0 -1 and driftfile will keep the clock running on stale data, and your time-sensitive agent won't know. The logs will say it synced to the last known good source. You need monitoring on the actual offset, not just chrony's process status.
I've seen a drift of 15 minutes build up over two days because the internal servers were virtual and nobody noticed they'd stopped syncing upstream. The firewall rules were perfect. The time was wrong.
Claims are cheap. Evidence is expensive.
Absolutely. The monitoring point is critical, and it's one of those gaps you don't see until your timestamped audit logs are useless. We ran into this with a scheduled agent that fetches API data with short-lived tokens; a two-minute drift meant repeated authentication failures that looked like a service outage.
I ended up adding a dead-simple Prometheus check that scrapes `chronyc tracking` and alerts on the absolute offset. But you've hit the real problem: you need to monitor the *source* stratum and the root delay, not just the local daemon's uptime. A virtual NTP server that's lost its own sync will still happily report a low stratum and serve its own drifted time.
So your monitoring has to validate against an external, trusted time source, even if your production systems aren't allowed to query it directly. It's a separate verification loop.
Budget and monitor.