DNS in Kubernetes (The Hard Way)

DNS was the one thing I swore I’d keep on Docker. It’s critical infrastructure, if DNS goes down, nothing works. Not the kind of service you want to experiment with.

But then neuron-1 needed a reboot, and for 30 seconds my entire network couldn’t resolve anything. Phones complaining, smart home devices going offline, the works, you know.

The Current Setup

My DNS stack isn’t just AdGuard. It’s a three-layer chain:

AdGuard Home handles the user-facing stuff: ad blocking, query logging and local DNS rewrites. It’s what my router points to.

Unbound sits behind AdGuard as a recursive resolver. It validates DNSSEC, maintains a ~500 MB cache, and handles query minimization for privacy. Most queries never leave this layer.

DNSCrypt encrypts whatever Unbound can’t resolve locally. Upstream queries go to Cloudflare and Quad9 over encrypted channels. My ISP sees nothing.

All three run on neuron-1 in Docker, on a private bridge network with static IPs. It works well. The problem is “works well on one machine.”

Why Bother?

The honest answer: I wanted to do it.

The practical answer: failover. If neuron-1 goes down for maintenance, I want DNS to keep working. Kubernetes can reschedule pods to healthy nodes automatically. Docker on a single host can’t.

But DNS in Kubernetes has some challenges I had to think through first.

The Challenges

UDP 53 Exposure

DNS needs to listen on port 53. In Kubernetes, you can’t just bind to privileged ports on the host (well, you can with hostNetwork, but that defeats the purpose). NodePort services can’t use ports below 30000.

The solution: a MetalLB LoadBalancer. It assigns a virtual IP from my LAN, and clients connect to that. AdGuard gets 192.168.x.101:53, sitting right next to Traefik’s 192.168.x.100.

No Native Clustering

Here’s what I learned: AdGuard Home doesn’t support clustering. At all.

If you run two replicas behind a load balancer, each maintains its own:

Query logs
Statistics
Client settings
Filter lists

There’s no sync. You’d have two independent AdGuard instances that just happen to share an IP. Queries would randomly hit either one, and your stats would be fragmented.

For my use case, that’s not acceptable. I want unified configuration and predictable behavior. I want to see which devices are making suspicious requests.

The Dependency Chain

If AdGuard moves to k3s but Unbound stays on Docker, how do they talk? The pod would need to reach Docker’s bridge network (10.2.0.x), which means either host networking or some ugly routing.

Cleaner to move all three.

The Solution

Two replicas with config sync for true HA.

AdGuard Home doesn’t support native clustering, but adguardhome-sync bridges the gap. It continuously syncs configuration between multiple AdGuard instances: filter lists, DNS rewrites, client settings, everything except query logs.

The setup: a StatefulSet with two replicas, each pinned to a different node via pod anti-affinity. The LoadBalancer distributes queries across both. If one pod or node dies, the other keeps serving with zero downtime.

Query logs are still per-instance (sync doesn’t merge them), but that’s an acceptable tradeoff for zero-downtime failover. The services communicate via static ClusterIPs, no dependency on Docker networking.

The Implementation

I built an umbrella Helm chart with three subcharts:

k8s/charts/dns/
├── Chart.yaml           # Dependencies on subcharts
├── values.yaml          # Override values
└── charts/
    ├── dnscrypt/        # Bottom of chain
    ├── unbound/         # Middle
    └── adguard/         # Top, external-facing

Each subchart is self-contained: Deployment, Service, ConfigMap, PVC. The umbrella chart wires them together.

The Configs

DNSCrypt was straightforward, the Docker config translated almost directly into a ConfigMap:

server_names = ['cloudflare', 'quad9-dnscrypt-ip4-filter-pri', 'google']
listen_addresses = ['0.0.0.0:6053']
require_dnssec = true

Unbound needed one change. The forward address went from Docker’s static IP:

# Old (Docker)
forward-addr: 10.2.0.210@6053

# New (Kubernetes)
forward-addr: dns-dnscrypt.dns.svc.cluster.local@6053

AdGuard doesn’t use a config file for upstream DNS, you configure that through the UI. After deployment, I set it to use dns-unbound.dns.svc.cluster.local:5335.

Storage

AdGuard’s working directory was 4.4 GB, almost all of it query logs:

adguard/work/data/
├── querylog.json      2.0G
├── querylog.json.1    2.4G
├── filters/           23M
└── stats.db           252K

I give it a 10 Gi PVC with no backups. Query logs are ephemeral, I don’t need them in B2. The config PVC (100 Mi) does get daily Longhorn backups.

Unbound and DNSCrypt get 100 Mi each. Their actual usage is under 50 KB, but tiny PVCs aren’t worth the mental overhead.

Migration Strategy

The beauty of this setup: I can run both stacks in parallel.

Deploy to k3s - DNS available at 192.168.40.101
Test manually - dig @192.168.40.101 google.com
Update one device - Point my laptop at the new IP and use it for a day
Update router DHCP - Primary: .101 (k3s), Secondary: .246 (Docker fallback)
Monitor - Watch query stats in both AdGuard instances
Decommission Docker - Once all traffic flows through k3s

If anything breaks, I change one DHCP setting and I’m back on Docker. No pressure.

What I Learned

Separate pods beat sidecars. I considered running all three containers in one pod. Same lifecycle, localhost communication, guaranteed same-node scheduling. But separate pods are cleaner:

Independent restarts
Separate logs (kubectl logs per service)
Mirrors the Docker architecture I’m migrating from

The latency difference between localhost and ClusterIP is negligible. DNS queries to upstream resolvers take 10–50 ms. Internal routing noise doesn’t matter.

MetalLB just works. UDP LoadBalancer services behaved exactly as expected. AdGuard gets both UDP and TCP on port 53, plus HTTP on 80 for the UI.

adguardhome-sync makes HA possible. Two replicas with config sync gives real HA: no downtime during node maintenance, no angry family members.

Issues We Hit (And How We Fixed Them)

Moving DNS to Kubernetes sounded simple on paper. Reality had other plans.

Unbound Can’t Resolve Kubernetes DNS Names

Unbound doesn’t accept hostnames as forward targets:

[error] cannot parse forward . ip address: 'dnscrypt.dns.svc.cluster.local@6053'

Fix: Assign static ClusterIPs and use those directly.

spec:
  clusterIP: 10.43.0.53

Tradeoff: ClusterIPs are immutable. Changing them means recreating the Service.

Where Did All My Client IPs Go?

All queries appeared to come from pod IPs due to SNAT.

Fix: Preserve source IPs:

externalTrafficPolicy: Local

The Chicken-and-Egg Problem

After a power outage, DNS didn’t come back because images couldn’t be pulled… because DNS was down.

Fix: Use cached images:

imagePullPolicy: IfNotPresent

Images are pre-pulled via a one-time DaemonSet.

Five-Minute Failover Is Too Slow

Default tolerations caused 300 s DNS outages.

Fix: Shorten tolerations in k3s:

default-not-ready-toleration-seconds=30
default-unreachable-toleration-seconds=30

Failover now takes ~45 seconds.

Single Replica = SPOF

One replica meant unavoidable DNS blips.

Fix: Two AdGuard replicas + adguardhome-sync.

StatefulSet Needs Anti-Affinity

Ensure replicas don’t land on the same node:

podAntiAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:

Sticky sessions were required.

Fix: Traefik IngressRoute with cookie-based stickiness.

MetalLB Failover and ARP Cache

Some clients cached the old MAC address.

Fix: Wait, or flush ARP caches manually.

Current Status

Migration complete. DNS is fully served from k3s.

AdGuard: 2 replicas, synced
Unbound: single replica, static ClusterIP
DNSCrypt: single replica, static ClusterIP
Storage: 4 PVCs total

Docker DNS on neuron-1 is gone.

The DNS stack was supposed to stay on Docker forever. Well, guess what, one step closer to have everything in Kubernetes.

Services migrated: 3 PVCs created: 4 YAML written: too much