Home Assistant on Kubernetes: Never say never
In my original k3s migration post, I wrote:
Home Assistant needs mDNS for device discovery... Getting mDNS to work properly in Kubernetes requires host networking, which defeats much of the isolation benefit. Docker just works.
I stand by that. mDNS in Kubernetes is a pain. But “pain” isn’t “impossible,” and I again, I wanted to do it.
What Changed
Nothing broke. That’s important to mention: this isn’t a “Docker failed me” story. Home Assistant on Docker was rock solid. Zigbee devices worked. Matter bridge worked. Automations ran.
But the itch was there. I’d migrated DNS to k3s (the thing I swore I’d never touch). Home automation was the last major workload on Docker. And every time I looked at my cluster dashboard, there was this gap: 14 Docker containers that weren’t part of the GitOps flow.
Plus, I wanted to edit YAML configs without SSH. More on that later.
The Stack
My home automation setup isn’t just Home Assistant:
Home Assistant is the brain: automations, dashboards, integrations with everything from my Daikin HVAC to my Roborock vacuum.
Zigbee2MQTT bridges my Zigbee mesh to MQTT. 40+ devices: lights, blinds, climate sensors, water leak detector, smart plugs. All talking through a network-attached SLZB-06M coordinator.
Mosquitto is the message broker. Simple, no auth (it’s LAN-only), just works.
Matter Hub exposes HA entities to Apple Home via the Matter protocol. This is the tricky one: it needs mDNS to advertise itself to Apple devices.
The mDNS Problem
Here’s why I hesitated before. mDNS (multicast DNS) uses 224.0.0.251:5353 to broadcast on the local network. When your phone asks “is there a Matter bridge here?”, the bridge responds via multicast.
Kubernetes pods live in an overlay network. They can’t send multicast to your LAN unless you give them host networking or get creative with CNI plugins.
I looked at four options:
| Option | How It Works | Tradeoff |
|---|---|---|
| hostNetwork: true | Pod shares host's network stack | No isolation, but it works |
| Keep on Docker | Don't migrate Matter Hub | Split management |
| Avahi reflector | Bridge multicast via DaemonSet | Extra complexity |
| Multus CNI | Secondary NIC attached to LAN | Major infrastructure change |
After way too much research, the answer was obvious: hostNetwork: true.
Yes, it defeats some isolation benefits. But Matter Hub is a single-purpose bridge: it doesn’t need network isolation, it needs network access. The pod shares the host’s network namespace, can send mDNS, and HomeKit just works.
spec:
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet # Still use cluster DNS
containers:
- name: matter-hub
env:
- name: HAMH_MDNS_NETWORK_INTERFACE
value: "eno1" # Host's LAN interface
The Architecture
Here’s what the migrated stack looks like:
Most services use normal ClusterIP networking. Only Matter Hub gets host networking because it actually needs it.
The services talk to each other via Kubernetes DNS:
- Zigbee2MQTT →
mqtt://mosquitto:1883 - Matter Hub →
http://home-assistant.home.svc.cluster.local:8123
The Config Editing Problem
Here’s something that bugged me about the planned migration: editing Home Assistant configs.
On Docker, the config directory is a bind mount. I could SSH in, vim configuration.yaml, save, restart. Or use the File Editor add-on. Simple.
On Kubernetes, the config lives in a PVC. Getting to it means:
kubectl execinto the podkubectl cpfiles back and forth- Using HA’s built-in editor (limited)
None of those are great for actual development. I tweak YAML a lot: dashboards, automations, template sensors. I wanted something better.
Solution: Code Server sidecar.
containers:
- name: home-assistant
image: ghcr.io/home-assistant/home-assistant:stable
volumeMounts:
- name: config
mountPath: /config
- name: code-server
image: linuxserver/code-server:latest
env:
- name: DEFAULT_WORKSPACE
value: /config
volumeMounts:
- name: config
mountPath: /config # Same PVC
Both containers share the same PVC. Home Assistant serves the app at assistant.k3s.home. Code Server serves VS Code at assistant.k3s.home/code. Full IDE with YAML syntax highlighting, git integration, terminal access.
Is it overkill? Probably. But editing Lovelace dashboards in a proper editor beats the HA UI any day.
The Migration Challenge
Unlike stateless apps, home automation has critical data that can’t be regenerated:
| Data | Why It Matters |
|---|---|
| Zigbee network key | Lose this, re-pair 40 devices |
| Device pairings | Entity IDs, friendly names, groups |
| Matter fabrics | HomeKit needs to re-commission |
| Automations | Hours of "if humidity > 70%, turn on dehumidifier" logic |
| History DB | Not critical, but nice to keep |
The migration strategy:
1. Deploy with replicas=0 → Creates empty PVC
2. Stop Docker containers → Free the coordinator
3. Copy data to PVC → Preserve everything
4. Update MQTT URL → mqtt://mqtt → mqtt://mosquitto:1883
5. Scale to replicas=1 → Let it start
6. Verify → Devices reconnect, automations fire
The Zigbee coordinator is network-attached (tcp://192.168.x.100:6638), not USB. That simplifies everything: no device passthrough, no node affinity for USB. Zigbee2MQTT just needs TCP connectivity, which works from any pod on any node.
What Could Go Wrong
I’ve been running DNS on k3s long enough to know: things will break. Here’s what I’m watching for:
mDNS stops working. If Matter Hub’s pod lands on a node with different network interfaces, HAMH_MDNS_NETWORK_INTERFACE=eno1 might be wrong. Each node has the same interface name (Ansible standardization), but I should validate after deployment.
Zigbee coordinator timeout. The TCP connection to the SLZB-06M needs to stay stable. If Zigbee2MQTT restarts too often, the coordinator might drop devices temporarily.
SQLite locking. Home Assistant’s database is SQLite. Code Server and HA writing simultaneously could cause issues. In practice, HA does most writes; Code Server just reads YAML: should be fine.
PVC performance. Longhorn is networked storage. HA does a lot of small writes (state changes, history). If latency spikes, automations might feel sluggish. I’m allocating 5 Gi with no backup requirements for the history DB: if it corrupts, I’ll just start fresh.
The Helm Charts
Four charts, all following the same pattern as my other workloads:
k8s/charts/
├── mosquitto/ # Simplest, no dependencies
│ ├── deployment.yaml
│ ├── service.yaml # ClusterIP 1883, 9001
│ └── pvc.yaml # 1Gi Longhorn
│
├── zigbee2mqtt/ # Depends on mosquitto
│ ├── deployment.yaml # udev mount for device detection
│ ├── service.yaml
│ ├── ingress.yaml # zigbee2mqtt.k3s.home
│ └── pvc.yaml # 1Gi, holds network key
│
├── home-assistant/ # Depends on mosquitto
│ ├── deployment.yaml # Privileged, dbus, code-server sidecar
│ ├── service.yaml # Two services: HA + code-server
│ ├── ingress.yaml # Two ingresses
│ └── pvc.yaml # 5Gi, shared between containers
│
└── matter-hub/ # Depends on home-assistant
├── deployment.yaml # hostNetwork: true
├── pvc.yaml # 500Mi, Matter fabric data
└── sealed-secret.yaml # HA access token
Migration order matters: Mosquitto first (no dependencies), then Zigbee2MQTT (needs MQTT), then Home Assistant (needs MQTT), then Matter Hub (needs HA).
Update: The Migration Actually Happened
The migration is complete. Everything works. But the path from “charts ready” to “production running” had more surprises than expected.
What Actually Broke
Remember when I said “things will probably break”? Here’s the list.
The Honeypot IP Incident
This one cost me hours.
I picked 192.168.x.2 and 192.168.x.3 for Home Assistant and Matter Hub. Nice, low numbers. Easy to remember. Should work.
Pods deployed. Got their IPs. But couldn’t reach the gateway. Couldn’t reach other VLANs. ARP showed incomplete entries.
$ kubectl exec -n home deploy/home-assistant -- ping 192.168.x.1
PING 192.168.x.1 (192.168.x.1): 56 data bytes
^C
--- 192.168.x.1 ping statistics ---
5 packets transmitted, 0 packets received, 100% packet loss
I tried everything:
- Switched from macvlan to ipvlan
- Added explicit routes for cross-VLAN traffic
- Verified the host could reach the gateway
- Checked firewall rules
Nothing worked. The packets just... vanished.
Then my past self whispered: “192.168.x.2 is the honeypot address.”
Of course. I’d configured .2 as a honeypot IP in Unifi months ago. Any traffic to/from that IP gets dropped silently. Perfect for catching scanners. Terrible for running actual services.
Lesson learned: Document your honeypot IPs. Or better, use them outside your normal allocation range.
New IPs: .250, .251, .252 : safely in the reserved range above DHCP (which ends at .249).
macvlan vs ipvlan
Before discovering the honeypot issue, I burned time on this tangent.
macvlan creates virtual MAC addresses for each pod. Some managed switches and routers don’t handle this well: they see multiple MACs on one port and get confused.
ipvlan shares the host’s MAC address. Less isolation, but more compatible. The tradeoff:
| Mode | MAC | Works with managed switches | Pod-to-host communication |
|---|---|---|---|
| macvlan | Virtual per pod | Sometimes | Can't reach same-interface host |
| ipvlan L2 | Shared (host's) | Always | Same limitation |
I switched to ipvlan thinking it would fix the gateway issue. It didn’t (because honeypot), but it’s still the better choice for my managed Unifi switches.
Multus Instead of hostNetwork
Plot twist: I didn’t use hostNetwork: true after all.
While debugging the networking issues, I realized Multus CNI was cleaner than I’d thought. Instead of giving pods full host network access, Multus attaches a secondary interface connected to the LAN.
# Pod gets two interfaces:
# eth0 → cluster network (10.42.x.x) - normal Kubernetes networking
# net1 → LAN (192.168.x.x) - mDNS-capable secondary interface
annotations:
k8s.v1.cni.cncf.io/networks: '[{"name":"lan-macvlan","ips":["192.168.x.250/24"]}]'
Benefits:
- Cluster DNS still works (
*.home.svc.cluster.local) - Services are still reachable via ClusterIP
- Only mDNS traffic goes out the secondary interface
- Better isolation than full
hostNetwork
The catch: pods must be pinned to nodes with the matching interface. My NAD targets eno1, which exists on neuron-1 and neuron-2. neuron-3 uses enp1s0. Hence the nodeSelector.
Cross-VLAN Routing
Multus pods get a secondary interface, but that interface doesn’t know about other VLANs by default. My Home Assistant at 192.168.x.250 couldn’t reach Zigbee devices on 192.168.x.x (IoT VLAN).
The fix: add routes in the NetworkAttachmentDefinition.
# multus/values.yaml
lan:
type: ipvlan
mode: l2
routes:
- dst: "192.168.10.0/24" # Management VLAN
gw: "192.168.x.1"
- dst: "192.168.x.0/24" # IoT VLAN
gw: "192.168.x.1"
Without these, return traffic to other VLANs would try to go via eth0 (cluster network) instead of net1 (LAN), and get dropped.
MQTT Broker Hostname Changed
Docker Compose used mqtt as the service name. Kubernetes uses mosquitto. Every config file referencing the MQTT broker needed updating:
- Home Assistant:
.storage/core.config_entries(JSON, not YAML) - Zigbee2MQTT:
configuration.yaml - Matter Hub: environment variable
The error messages were helpful at least: “Connection refused to mqtt:1883” is pretty clear.
Database Corruption (Expected)
I copied Home Assistant’s SQLite database while it was running. WAL files existed. Corruption was inevitable.
ERROR (Recorder) - Database corruption detected
I’d prepared for this. The history DB isn’t critical: losing “when did the kitchen light turn on last Tuesday” isn’t a disaster. HA recreated a fresh database on startup.
If you need to preserve history: stop HA first, or use sqlite3 .backup to get a consistent copy.
The Final Architecture
After all the debugging, here’s what actually got deployed:
Key differences from the original plan:
- Multus instead of hostNetwork
- ipvlan instead of macvlan
- IPs in the
.250–.254range (avoiding honeypot) - Explicit routes for cross-VLAN traffic
Was It Worth It?
Honestly? Yes.
Not because k3s is better than Docker for home automation: it’s objectively more complex. But because I learned:
- How Multus CNI actually works
- Why ipvlan exists alongside macvlan
- How to debug pod networking when nothing makes sense
- That past-me’s clever honeypot would eventually bite future-me
The Code Server sidecar alone was worth it. Editing Lovelace dashboards in VS Code with syntax highlighting and git is genuinely better than any HA add-on.
And now my entire homelab is GitOps. One repo, one workflow, one source of truth. git push deploys everything from DNS to home automation.
Migration status: Complete
Containers remaining on Docker: 14 → 10
Hours debugging honeypot IP: ~3
Times I said "why isn't this working": countless
Things that actually broke: several, all fixed