Overengineering My Homelab

I've been steadily running my homelab on Docker Compose for quite some time now, but since I've never had the opportunity to work with Kubernetes before (we use HashiCorp Nomad at my current company), I decided to migrate most of my services to k3s.

Is it overkill? Most definitely. But despite the pain and a few tears, I genuinely enjoyed the process and learned a lot, which was the whole point.

Why Bother?

I've been a software engineer for years, but my Kubernetes knowledge was mostly theoretical. I'd read the docs, watched talks, maybe run kubectl apply on a few manifests. But I'd never built a cluster from scratch or debugged why a pod would not schedule at 11 PM when all I wanted was to watch a movie.

My Docker Compose setup worked. That was the problem. It worked so well that I barely had to touch it.

So I bought two more mini PCs and decided to make my life harder.

Setting Up the Nodes

Three mini PCs, each with 32GB RAM. The homelab needed a theme, so I named them neuron-1, neuron-2, and neuron-3.

Node	Hardware	CPU	RAM	Storage
neuron-1	Intel NUC8i5BEH	Core i5-8259U	32GB	1TB
neuron-2	Intel NUC8i5BEH	Core i5-8259U	32GB	500GB
neuron-3	Beelink SER5 Pro	Ryzen 7 5700U	32GB	500GB

Different hardware, but they all needed the same base configuration: packages, users, SSH keys, firewall rules, and mount points. Doing this by hand once is fine. Doing it three times is how you end up with “works on my machine” problems.

After some investigation into provisioning tools, I landed on Ansible. It is widely used, well documented, and very much battle-tested.

I wrote a base playbook that handles:

System packages and updates
User setup and SSH hardening
Required kernel modules for k3s
NFS and iSCSI prerequisites

One command, all nodes configured identically:

ansible-playbook base/setup.yml --ask-become-pass

A separate k3s playbook handles the cluster setup. The first node starts as the server, the others join as agents, and eventually all nodes run both roles for high availability.

The whole thing is idempotent, so I can re-run it to fix drift or add a node.

Was Ansible strictly necessary? No. But manually SSHing into multiple machines to repeat the same steps felt wrong. When neuron-3 joined the cluster later, I just added it to the inventory and ran the playbook. Five minutes later, it was done.

What I Was Running

Before migrating anything, neuron-1 was doing all the work. One machine, 32 containers, all managed with Docker Compose split across multiple files.

DNS and Networking

AdGuard Home as the DNS server
Unbound for recursive resolution
DNSCrypt for encrypted upstream queries
Traefik as the reverse proxy
Cloudflared for Cloudflare Tunnel access

Home Automation

Home Assistant at the center
Zigbee2MQTT bridging Zigbee devices
Mosquitto MQTT
Matter Hub exposing devices to Apple Home
A staging Home Assistant instance for testing

Monitoring

This started to get out of hand:

Grafana, Prometheus, Loki, Promtail
cAdvisor and Node Exporter
Uptime Kuma
Watchtower
ChangeDetection

Media and Apps

Linkwarden for bookmarks
Calibre-Web for ebooks
Homepage as a dashboard
PostgreSQL and Redis as backing databases

The problem: everything ran on one node. If neuron-1 went down for maintenance, the entire homelab went dark, DNS included.

The Docker Compose files were organized, but the setup still had:

A single point of failure
No automatic failover
Secrets in plaintext files
Manual deployments over SSH

It worked. But it did not scale, and I was not learning anything new.

The Migration Strategy

I should mention: this was not a weekend project. The whole migration happened over about a year, picking it up when I was motivated and dropping it for months when life got busy. Docker was stable, so there was no urgency. I would make progress, learn something, hit a wall, and come back to it later with fresh eyes.

I also did not migrate everything at once. That would have been chaos. Instead, I broke the work into phases.

Phase 1: Get k3s running on two nodes

Ansible playbook, fairly straightforward. k3s bundles everything you need, so there is no separate etcd cluster and no complicated PKI setup to manage.

At its core, getting a node up really is as simple as:

curl -sfL https://get.k3s.io | sh -

That command alone gets you surprisingly far.

Phase 2: Deploy the platform layer

Before deploying any applications, I needed the cluster fundamentals in place:

MetalLB for LoadBalancer IPs, since there is no cloud provider
Longhorn for persistent storage
Sealed Secrets so I could safely commit encrypted secrets to Git
Argo CD for GitOps

This phase took longer than expected. Longhorn required specific mount options and node prerequisites. MetalLB needed L2 advertisement configured correctly for my network. Sealed Secrets required understanding how kubeseal interacts with the cluster certificate.

Nothing here was especially difficult, but everything was new and tightly coupled.

Phase 3: Set up observability

Before migrating applications, I wanted visibility into the cluster. Debugging Kubernetes without metrics and logs is painful.

The LGTM stack went in early: Loki, Grafana, Mimir, and Tempo. I learned the hard way that adding observability after things break is much harder than starting with it.

Phase 4: Migrate apps one by one

I started with stateless applications. Homepage was first. No data, no database, and if it broke, nobody noticed.

From there, I gradually moved to stateful apps, removing Docker containers as each migration succeeded and validating that things actually worked before moving on.

Phase 5: Join neuron-1 to the cluster

Once most services were running in Kubernetes and Docker had been cleaned up, neuron-1 could finally join the cluster.

It became a hybrid node. It still runs Docker for the workloads that make sense there, but it also participates fully in the k3s cluster. At that point, the original single point of failure was finally gone.

The Hard Parts

Bootstrapping the Platform Layer

Everything depends on everything else, and nothing tells you that upfront, so figuring out the right order took trial and error.

But once the setup scripts were in place, spinning up a new cluster or recovering from a failure became straightforward. The hard work pays off in repeatability.

Two Traefiks, Two Domains

Docker Traefik already handled ports 80 and 443 on neuron-1. k3s also bundles Traefik. Running both on the same ports would conflict.

The solution was to separate them by IP and domain:

k3s Traefik runs as a LoadBalancer service with a MetalLB IP (192.168.x.100) and handles *.k3s.home
Docker Traefik keeps the host ports on neuron-1 (192.168.x.1) and handles *.lab.home

DNS rewrites in AdGuard route each domain to the correct IP. It works, but debugging DNS issues now has twice as many places to check.

Data Migration

Stateless apps are easy. Persistent data is where things get risky.

The pattern I settled on:

Stop the Docker container
Tar up the volume data
Create the Kubernetes deployment with an empty PVC
kubectl cp the tarball into a temporary pod
Extract into the PVC mount
Start the real application pod
Pray

Some apps migrated cleanly. Others required database dumps and restores. PostgreSQL in particular cannot be migrated by copying data directories between versions.

The worst part is that you do not know if it worked until days later, when something subtly breaks.

Converting 20+ Apps to Helm Charts

Here is the tedious part nobody really warns you about: not every app has a Helm chart.

Some charts were excellent. Install, configure values, done. Others were outdated or too opinionated. A few apps had no charts at all.

For those, I wrote custom charts from scratch. Deployment, Service, Ingress, ConfigMap, maybe a PVC. Copy the pattern from the last chart, adjust ports and environment variables, repeat.

# Every app needs roughly the same boilerplate
apiVersion: apps/v1
kind: Deployment
# ... 50 lines of YAML you've written in every other chart

Individually, this is easy. Doing it 20+ times is monotonous.

The upside is clarity. Once something is a Helm chart, its configuration is documented by default.

Unified Observability

I wanted a single Grafana dashboard showing both Docker containers and Kubernetes pods.

Alloy scrapes Docker metrics and ships them to the Kubernetes-hosted LGTM stack. It sounds simple on paper, except authentication, networking, service exposure, and label consistency all need to line up.

It took several iterations before queries reliably returned logs from both environments.

What I Kept on Docker

Not everything belongs in Kubernetes.

Home Assistant needs mDNS and hardware-adjacent networking. Docker just works.
The DNS stack binds to port 53 and is too critical to risk during cluster outages.
Matter Hub needs host networking for mDNS.

The rule became simple:

Hardware access or critical infrastructure stays on Docker. Everything else goes to k3s.

The Wins

Argo CD Image Updater handles deployments automatically.
Rollbacks are trivial with Git reverts.
Configuration is documented via Helm values.
Longhorn snapshots make backups boring, which is ideal.

The boring stuff finally works without thinking about it.

Was It Worth It?

For running a homelab? Probably not.

For learning Kubernetes? Absolutely.

I now understand scheduling failures, PVC behavior, GitOps workflows, and how to debug broken deployments under pressure.

The setup is overkill. But I can now look at a Kubernetes deployment and actually understand what is happening.

That was the point.

Current State

Docker: 14 containers
k3s: 21 Helm charts across 3 nodes
Storage: Longhorn and Garage
GitOps: Argo CD
Observability: LGTM stack

It took a few weekends, some late nights, and more YAML than I care to admit. But it is running, documented, and stable.

Would I recommend it?

If you want to learn Kubernetes, yes. Build it, break it, fix it. There is no better teacher.

Best,

Jorge Lima