Home Assistant on Kubernetes: Never say never

In my original k3s migration post, I wrote:

Home Assistant needs mDNS for device discovery... Getting mDNS to work properly in Kubernetes requires host networking, which defeats much of the isolation benefit. Docker just works.

I stand by that. mDNS in Kubernetes is a pain. But “pain” isn’t “impossible,” and I again, I wanted to do it.

What Changed

Nothing broke. That’s important to mention: this isn’t a “Docker failed me” story. Home Assistant on Docker was rock solid. Zigbee devices worked. Matter bridge worked. Automations ran.

But the itch was there. I’d migrated DNS to k3s (the thing I swore I’d never touch). Home automation was the last major workload on Docker. And every time I looked at my cluster dashboard, there was this gap: 14 Docker containers that weren’t part of the GitOps flow.

Plus, I wanted to edit YAML configs without SSH. More on that later.

The Stack

My home automation setup isn’t just Home Assistant:

Home Assistant is the brain: automations, dashboards, integrations with everything from my Daikin HVAC to my Roborock vacuum.

Zigbee2MQTT bridges my Zigbee mesh to MQTT. 40+ devices: lights, blinds, climate sensors, water leak detector, smart plugs. All talking through a network-attached SLZB-06M coordinator.

Mosquitto is the message broker. Simple, no auth (it’s LAN-only), just works.

Matter Hub exposes HA entities to Apple Home via the Matter protocol. This is the tricky one: it needs mDNS to advertise itself to Apple devices.

The mDNS Problem

Here’s why I hesitated before. mDNS (multicast DNS) uses 224.0.0.251:5353 to broadcast on the local network. When your phone asks “is there a Matter bridge here?”, the bridge responds via multicast.

Kubernetes pods live in an overlay network. They can’t send multicast to your LAN unless you give them host networking or get creative with CNI plugins.

I looked at four options:

Option	How It Works	Tradeoff
hostNetwork: true	Pod shares host's network stack	No isolation, but it works
Keep on Docker	Don't migrate Matter Hub	Split management
Avahi reflector	Bridge multicast via DaemonSet	Extra complexity
Multus CNI	Secondary NIC attached to LAN	Major infrastructure change

After way too much research, the answer was obvious: hostNetwork: true.

Yes, it defeats some isolation benefits. But Matter Hub is a single-purpose bridge: it doesn’t need network isolation, it needs network access. The pod shares the host’s network namespace, can send mDNS, and HomeKit just works.

spec:
  hostNetwork: true
  dnsPolicy: ClusterFirstWithHostNet # Still use cluster DNS
  containers:
    - name: matter-hub
      env:
        - name: HAMH_MDNS_NETWORK_INTERFACE
          value: "eno1" # Host's LAN interface

The Architecture

Here’s what the migrated stack looks like:

Most services use normal ClusterIP networking. Only Matter Hub gets host networking because it actually needs it.

The services talk to each other via Kubernetes DNS:

Zigbee2MQTT → mqtt://mosquitto:1883
Matter Hub → http://home-assistant.home.svc.cluster.local:8123

The Config Editing Problem

Here’s something that bugged me about the planned migration: editing Home Assistant configs.

On Docker, the config directory is a bind mount. I could SSH in, vim configuration.yaml, save, restart. Or use the File Editor add-on. Simple.

On Kubernetes, the config lives in a PVC. Getting to it means:

kubectl exec into the pod
kubectl cp files back and forth
Using HA’s built-in editor (limited)

None of those are great for actual development. I tweak YAML a lot: dashboards, automations, template sensors. I wanted something better.

Solution: Code Server sidecar.

containers:
  - name: home-assistant
    image: ghcr.io/home-assistant/home-assistant:stable
    volumeMounts:
      - name: config
        mountPath: /config

  - name: code-server
    image: linuxserver/code-server:latest
    env:
      - name: DEFAULT_WORKSPACE
        value: /config
    volumeMounts:
      - name: config
        mountPath: /config # Same PVC

Both containers share the same PVC. Home Assistant serves the app at assistant.k3s.home. Code Server serves VS Code at assistant.k3s.home/code. Full IDE with YAML syntax highlighting, git integration, terminal access.

Is it overkill? Probably. But editing Lovelace dashboards in a proper editor beats the HA UI any day.

The Migration Challenge

Unlike stateless apps, home automation has critical data that can’t be regenerated:

Data	Why It Matters
Zigbee network key	Lose this, re-pair 40 devices
Device pairings	Entity IDs, friendly names, groups
Matter fabrics	HomeKit needs to re-commission
Automations	Hours of "if humidity > 70%, turn on dehumidifier" logic
History DB	Not critical, but nice to keep

The migration strategy:

1. Deploy with replicas=0 → Creates empty PVC
2. Stop Docker containers → Free the coordinator
3. Copy data to PVC → Preserve everything
4. Update MQTT URL → mqtt://mqtt → mqtt://mosquitto:1883
5. Scale to replicas=1 → Let it start
6. Verify → Devices reconnect, automations fire

The Zigbee coordinator is network-attached (tcp://192.168.x.100:6638), not USB. That simplifies everything: no device passthrough, no node affinity for USB. Zigbee2MQTT just needs TCP connectivity, which works from any pod on any node.

What Could Go Wrong

I’ve been running DNS on k3s long enough to know: things will break. Here’s what I’m watching for:

mDNS stops working. If Matter Hub’s pod lands on a node with different network interfaces, HAMH_MDNS_NETWORK_INTERFACE=eno1 might be wrong. Each node has the same interface name (Ansible standardization), but I should validate after deployment.

Zigbee coordinator timeout. The TCP connection to the SLZB-06M needs to stay stable. If Zigbee2MQTT restarts too often, the coordinator might drop devices temporarily.

SQLite locking. Home Assistant’s database is SQLite. Code Server and HA writing simultaneously could cause issues. In practice, HA does most writes; Code Server just reads YAML: should be fine.

PVC performance. Longhorn is networked storage. HA does a lot of small writes (state changes, history). If latency spikes, automations might feel sluggish. I’m allocating 5 Gi with no backup requirements for the history DB: if it corrupts, I’ll just start fresh.

The Helm Charts

Four charts, all following the same pattern as my other workloads:

k8s/charts/
├── mosquitto/          # Simplest, no dependencies
│   ├── deployment.yaml
│   ├── service.yaml    # ClusterIP 1883, 9001
│   └── pvc.yaml        # 1Gi Longhorn
│
├── zigbee2mqtt/        # Depends on mosquitto
│   ├── deployment.yaml # udev mount for device detection
│   ├── service.yaml
│   ├── ingress.yaml    # zigbee2mqtt.k3s.home
│   └── pvc.yaml        # 1Gi, holds network key
│
├── home-assistant/     # Depends on mosquitto
│   ├── deployment.yaml # Privileged, dbus, code-server sidecar
│   ├── service.yaml    # Two services: HA + code-server
│   ├── ingress.yaml    # Two ingresses
│   └── pvc.yaml        # 5Gi, shared between containers
│
└── matter-hub/         # Depends on home-assistant
    ├── deployment.yaml # hostNetwork: true
    ├── pvc.yaml        # 500Mi, Matter fabric data
    └── sealed-secret.yaml  # HA access token

Migration order matters: Mosquitto first (no dependencies), then Zigbee2MQTT (needs MQTT), then Home Assistant (needs MQTT), then Matter Hub (needs HA).

Update: The Migration Actually Happened

The migration is complete. Everything works. But the path from “charts ready” to “production running” had more surprises than expected.

What Actually Broke

Remember when I said “things will probably break”? Here’s the list.

The Honeypot IP Incident

This one cost me hours.

I picked 192.168.x.2 and 192.168.x.3 for Home Assistant and Matter Hub. Nice, low numbers. Easy to remember. Should work.

Pods deployed. Got their IPs. But couldn’t reach the gateway. Couldn’t reach other VLANs. ARP showed incomplete entries.

$ kubectl exec -n home deploy/home-assistant -- ping 192.168.x.1
PING 192.168.x.1 (192.168.x.1): 56 data bytes
^C
--- 192.168.x.1 ping statistics ---
5 packets transmitted, 0 packets received, 100% packet loss

I tried everything:

Switched from macvlan to ipvlan
Added explicit routes for cross-VLAN traffic
Verified the host could reach the gateway
Checked firewall rules

Nothing worked. The packets just... vanished.

Then my past self whispered: “192.168.x.2 is the honeypot address.”

Of course. I’d configured .2 as a honeypot IP in Unifi months ago. Any traffic to/from that IP gets dropped silently. Perfect for catching scanners. Terrible for running actual services.

Lesson learned: Document your honeypot IPs. Or better, use them outside your normal allocation range.

New IPs: .250, .251, .252 : safely in the reserved range above DHCP (which ends at .249).

macvlan vs ipvlan

Before discovering the honeypot issue, I burned time on this tangent.

macvlan creates virtual MAC addresses for each pod. Some managed switches and routers don’t handle this well: they see multiple MACs on one port and get confused.

ipvlan shares the host’s MAC address. Less isolation, but more compatible. The tradeoff:

Mode	MAC	Works with managed switches	Pod-to-host communication
macvlan	Virtual per pod	Sometimes	Can't reach same-interface host
ipvlan L2	Shared (host's)	Always	Same limitation

I switched to ipvlan thinking it would fix the gateway issue. It didn’t (because honeypot), but it’s still the better choice for my managed Unifi switches.

Multus Instead of hostNetwork

Plot twist: I didn’t use hostNetwork: true after all.

While debugging the networking issues, I realized Multus CNI was cleaner than I’d thought. Instead of giving pods full host network access, Multus attaches a secondary interface connected to the LAN.

# Pod gets two interfaces:
# eth0 → cluster network (10.42.x.x) - normal Kubernetes networking
# net1 → LAN (192.168.x.x) - mDNS-capable secondary interface

annotations:
  k8s.v1.cni.cncf.io/networks: '[{"name":"lan-macvlan","ips":["192.168.x.250/24"]}]'

Benefits:

Cluster DNS still works (*.home.svc.cluster.local)
Services are still reachable via ClusterIP
Only mDNS traffic goes out the secondary interface
Better isolation than full hostNetwork

The catch: pods must be pinned to nodes with the matching interface. My NAD targets eno1, which exists on neuron-1 and neuron-2. neuron-3 uses enp1s0. Hence the nodeSelector.

Cross-VLAN Routing

Multus pods get a secondary interface, but that interface doesn’t know about other VLANs by default. My Home Assistant at 192.168.x.250 couldn’t reach Zigbee devices on 192.168.x.x (IoT VLAN).

The fix: add routes in the NetworkAttachmentDefinition.

# multus/values.yaml
lan:
  type: ipvlan
  mode: l2
  routes:
    - dst: "192.168.10.0/24" # Management VLAN
      gw: "192.168.x.1"
    - dst: "192.168.x.0/24" # IoT VLAN
      gw: "192.168.x.1"

Without these, return traffic to other VLANs would try to go via eth0 (cluster network) instead of net1 (LAN), and get dropped.

MQTT Broker Hostname Changed

Docker Compose used mqtt as the service name. Kubernetes uses mosquitto. Every config file referencing the MQTT broker needed updating:

Home Assistant: .storage/core.config_entries (JSON, not YAML)
Zigbee2MQTT: configuration.yaml
Matter Hub: environment variable

The error messages were helpful at least: “Connection refused to mqtt:1883” is pretty clear.

Database Corruption (Expected)

I copied Home Assistant’s SQLite database while it was running. WAL files existed. Corruption was inevitable.

ERROR (Recorder) - Database corruption detected

I’d prepared for this. The history DB isn’t critical: losing “when did the kitchen light turn on last Tuesday” isn’t a disaster. HA recreated a fresh database on startup.

If you need to preserve history: stop HA first, or use sqlite3 .backup to get a consistent copy.

The Final Architecture

After all the debugging, here’s what actually got deployed:

Key differences from the original plan:

Multus instead of hostNetwork
ipvlan instead of macvlan
IPs in the .250–.254 range (avoiding honeypot)
Explicit routes for cross-VLAN traffic

Was It Worth It?

Honestly? Yes.

Not because k3s is better than Docker for home automation: it’s objectively more complex. But because I learned:

How Multus CNI actually works
Why ipvlan exists alongside macvlan
How to debug pod networking when nothing makes sense
That past-me’s clever honeypot would eventually bite future-me

The Code Server sidecar alone was worth it. Editing Lovelace dashboards in VS Code with syntax highlighting and git is genuinely better than any HA add-on.

And now my entire homelab is GitOps. One repo, one workflow, one source of truth. git push deploys everything from DNS to home automation.

Migration status: Complete
Containers remaining on Docker: 14 → 10
Hours debugging honeypot IP: ~3
Times I said "why isn't this working": countless
Things that actually broke: several, all fixed

HASS on K8s: Never say never