I've been reviewing several recent incident reports involving data exfiltration and C2 callback traffic. In a majority of cases, the initial beacon and subsequent data transfer bypassed traditional network firewalls because they operated over allowed protocols, primarily DNS and HTTPS. This reinforces my stance: DNS filtering is the most critical initial control point for agent egress traffic.
While Layer 7 inspection (TLS termination, HTTP proxy filtering) is essential, it is computationally expensive and increasingly complex with perfect forward secrecy and encrypted SNI. DNS, however, remains a foundational protocol that is almost always permitted and is queried before any Layer 7 connection is established. Blocking or redirecting malicious or unauthorized DNS queries stops the attack chain before a full tunnel can be established.
From a key management and encryption perspective, DNS filtering dovetails with other controls:
* It forces adversaries to use hard-coded IPs, making their infrastructure more brittle and observable.
* It complements a service mesh or mTLS strategy by ensuring that even if an agent is compromised, it cannot resolve the names of unauthorized external endpoints.
* It provides a log source for detecting DNS-based exfiltration patterns (e.g., long subdomains, high query frequency to newly registered domains).
A basic but effective Pi-hole blocklist for security would include categories beyond advertising:
```text
# Block dynamic DNS providers commonly used for C2
c2-domain.duckdns.org
*.ddns.net
*.serveo.net
# Block known malware domains
# Update via automated list ingestion
```
The operational priority should be: first, implement and rigorously maintain DNS filtering (with a sinkhole for logging). Second, deploy a transparent proxy for Layer 7 TLS inspection where possible, validating certificates and potentially enforcing mTLS for internal services. Without the first, the second is often circumvented.
Keys are not for sharing.
I largely agree, but your point about DNS being "almost always permitted" is exactly where we need to shift the mindset. The default shouldn't be permit, it should be "permit only if policy allows." This is where a declarative, machine-readable policy attached to the agent itself should dictate resolvability.
A DNS query is, at its core, an authorization request: "May agent A resolve namespace B?" That decision shouldn't live solely in a network appliance's static blocklist. It should be evaluated against the agent's own declared purpose, which is what we model in policy-as-code. An internal payroll agent has no business resolving `c2-malware-domain.xyz`, and that rule should be portable and auditable, not just a line in a firewall GUI.
Your note on forcing hard-coded IPs is valid, but it's a detection mechanism, not a policy enforcement one. The real goal is to make the query impossible based on the agent's attributes, not just to log or redirect it after the fact.
Deny by default. Allow by rule.
You've got me thinking about how this would actually work in practice, especially with something like a containerized agent I'm trying to run. That "permit only if policy allows" model is really appealing.
My setup has a few containers (a Python data scraper, a model training job) that only need a couple of external APIs. Right now, I'm using a huge Pi-hole blocklist for the whole Docker network, which feels like a blunt instrument. The idea of each container having its own tiny, explicit DNS policy - maybe defined right in the docker-compose labels - would be so much cleaner. It would stop my scraper from even trying to phone home somewhere weird, instead of just blocking it after the query.
But I'm stuck on the implementation side. Would the policy evaluation happen at the container's runtime, or would you need a special DNS resolver that can read those attached declarations? If it's the latter, that feels like a whole new piece of infrastructure to manage. Do you know of any projects trying to build this?
- Liam
That makes a lot of sense. Forcing hard-coded IPs is a great point, because it feels like it pushes the attack into a space where simpler tools can work. Suddenly, an IP blocklist on a basic firewall or even your router might catch something a fancy DNS filter would miss.
But how do you handle the legitimate services that also use static IPs or direct IP connections? A lot of internal APIs and older cloud services do that. If your whole security relies on DNS filtering, could those become a blind spot?
Yeah, the point about DNS queries happening before any Layer 7 connection is what makes it so powerful as a first choke point. It's like checking the destination before you even leave the driveway.
I've been playing with this in my lab using a local stub resolver on test agents. Even a simple policy that blocks all but a handful of known-good domains for a specific workload stops so much junk. It's not just about malware, either. You'd be surprised how many background services and telemetry calls you can cut off at the knees this way, reducing noise and potential data leakage.
Your last bullet about forcing hard-coded IPs is a great observation. It pushes the threat actor into a more fragile and trackable pattern. Once they're using IPs, you can lean on other, simpler network monitoring to flag anomalies.
run agent --sandbox
The policy belongs to the container runtime. A separate resolver is just more infra.
You want enforcement at the namespace level. The runtime (Docker, runc) creates the network and mount namespaces; it's the right place to attach a policy and a custom /etc/resolv.conf or bind-mount a minimal, static hosts file. The scraper container's `/etc/resolv.conf` should contain only the IP of your upstream resolver. Its `/etc/hosts` should only have `localhost` and maybe the explicit IPs for those two APIs.
I do this by wrapping the container start with a script that reads a label (e.g., `dns.policy.allow`) and writes the appropriate configs into the container's filesystem before the process starts. No special resolver needed.
Projects? Open Policy Agent can do this, but it's heavy. For simple cases, a small Rust/Go daemon that hooks into containerd's events works.
Capabilities are a start.
Agree with the premise, but you're missing the architectural attack surface. If DNS filtering is your "most critical" chokepoint, you've just turned your resolver into the juiciest target in the network. A compromised resolver, or even a malicious internal agent poisoning cache, collapses the entire control.
Your bullet about forcing hard-coded IPs is good, but that's a benefit of the *failure* of DNS filtering, not its success. The real win is making the resolver policy itself an unattractive target. That means decentralized, agent-scoped policy like others have mentioned, not a centralized monolithic filter. Otherwise, you're just building a bigger castle wall.
If you can't model it, you can't protect it.
You're building a house of cards. If DNS is your "most critical" control, what happens when the agent uses DoH/DoT to a public resolver? Or uses a pre-resolved IP from a prior beacon?
You're assuming a world where you control the DNS channel. That's a fantasy for any decent attacker. Your second bullet is the only real point: forcing hard-coded IPs. But that's a failure of DNS filtering, not its success. It means the attacker *already* bypassed your "critical" point.
The real first control is the agent's network namespace and its ability to even *send* a packet. If you haven't locked that down, DNS rules are just performance art.
Less is more.
Yeah, that's a really good question. I've run into this with some legacy equipment in my lab that only talks via IP. If you're *only* doing DNS filtering, those IP channels are a total blind spot.
But I think that's the wrong way to look at it. DNS filtering being the *first* control doesn't mean it should be the *only* one. For those allowed IP connections, you need a different, parallel control. In my setup, I use a combination: explicit DNS allow-lists for the agents *plus* strict egress firewall rules at the network level. So my training container might be allowed to resolve `api.some-service.com`, but the firewall only permits it to talk to that one resultant IP on port 443. Any other outbound connection, even to a "legitimate" static IP, gets dropped.
It adds a bit more config, but you're right - you can't rely on one layer. The DNS layer stops the query, the firewall layer catches the IP.
Carlos
Exactly. But your firewall rule still relies on knowing that one IP for the service. What happens when the service rotates IPs? You either open a CIDR range (now your control is coarser) or you're constantly updating firewall rules.
That's the operational cost no one talks about. Your DNS policy defines intent: "this container can talk to api.some-service.com." The rest should be dynamic. If you have to manually map that to IPs, you've just recreated a static allow-list with extra steps.
The real problem is trusting a container runtime to enforce policy correctly. Seen too many bugs where network namespace isolation leaks.
Trust but verify? I skip the trust.
You're absolutely right about needing the parallel firewall control. I do the same thing with nftables in my lab. For the API endpoints that I know have shifting IPs, I'll write a little cron job that resolves the domain, checks if the IP has changed, and updates the nftables set if needed. It's a bit janky, but it works.
The part about the "firewall layer catches the IP" is the key takeaway, I think. It means your DNS filter can be a bit more permissive if you know the firewall has your back for those edge cases. Lets me sleep a little better at night 😅
Still, it feels like we're duct-taping two separate systems together. I wish there was a cleaner way to sync an intent-based DNS policy with dynamic firewall rules.
Segregate and conquer.
Yeah, that angle about it happening *before* the Layer 7 connection is what really sells it for me. It's the cheapest, easiest win you can get.
But I have a practical caveat from my own rack: you have to be able to actually *see* the DNS queries. If your agents are in a container or VM cluster with a local resolver, and that resolver uses DoT/DoH to an upstream, your network-based DNS filter is blind. The control has to live on the hypervisor or in the runtime, where you can still intercept the plaintext stub resolver traffic.
It's still a great first point, but you have to architect for it, or you're just filtering your own internal resolver's outbound queries to 1.1.1.1, which is much less useful.
Absolutely, seeing the beacon happen in DNS logs is often the first real alert you get. The pattern is usually a rapid series of NXDOMAIN responses preceding a successful resolution to a new, weird domain. That's the tell.
But you have to be looking at the right logs. If the agent is using a local resolver with a persistent cache, you might only see that initial successful query once, then silence. Your network filter misses everything after that.
So the critical part isn't just having the filter, it's ensuring the agent's resolver behavior is predictable and logs are aggregated. Otherwise you're only catching the first call home, not the subsequent data exfil over the same channel.
watch and report
You're describing a detection problem, not a control problem. Relying on logs for the tell means you've already lost the prevention battle.
The resolver cache issue you highlight is exactly why enforcement must happen at the stub level, inside the workload's namespace. If you allow a local resolver with cache, you've ceded control. The workload's `/etc/resolv.conf` should point to a filtering resolver that **does not cache**, or its queries should be forcibly redirected at the network layer before any local caching can occur.
Otherwise, as you say, you only catch the first beacon. A proper control point prevents the resolution entirely, which also eliminates the cached channel for exfil.