Raspberry Pi K3s Cluster
Four-node ARM Kubernetes cluster for homelab workloads
01. Overview
Reading the Kubernetes documentation is one thing. Actually running a cluster — dealing with ingress, storage classes, node failures, and rolling updates on hardware that cost you real money — is something else entirely. This project was built specifically to develop hands-on Kubernetes experience in a controlled environment where breaking something costs nothing beyond a few minutes of troubleshooting.
The cluster runs four Raspberry Pi 4 (8 GB) nodes, uses K3s as the lightweight Kubernetes distribution, and is managed entirely through GitOps using Flux CD. If it isn't in Git, it doesn't exist in the cluster.
02. Hardware
03. How It Was Built
Provisioning with Ansible
All four nodes are imaged from the same Raspberry Pi OS Lite (64-bit) base and
provisioned using an Ansible playbook that handles system updates, sets hostnames,
configures static IPs, enables cgroups v2 and memory
accounting in the kernel command line (required for K3s), and installs K3s in
server (control-plane) or agent (worker) mode depending on the host group.
Reprovisioning a wiped node takes about 8 minutes.
K3s configuration
K3s ships with Traefik as the default ingress controller, which I kept. The default local-path storage class is replaced with Longhorn for distributed, replicated persistent volumes — each PVC is replicated across 2 of the 4 nodes, so a single node failure doesn't take down stateful workloads. Flannel (the default CNI) handles pod networking with VXLAN for cross-node traffic.
GitOps with Flux CD
Flux monitors a private GitHub repository for changes to Kubernetes manifests.
Any commit to the main branch that touches a manifest is
automatically reconciled into the cluster within 60 seconds. Helm releases are
managed through Flux's HelmRelease CRDs, which handle upgrades, rollbacks, and
drift detection. Secrets are encrypted in Git using Mozilla SOPS with an age key
stored offline.
Workloads running on the cluster
The cluster currently runs: a local DNS override service (not Pi-hole — that stays on its own Pi), a Prometheus + Grafana observability stack, a private Gitea instance for self-hosted Git, a Miniflux RSS reader, and a Bitwarden-compatible Vaultwarden password manager. All are exposed through Traefik with TLS terminated via Let's Encrypt (DNS-01 challenge through Cloudflare API).
04. Lessons Learned
- → ARM-based clusters (aarch64) occasionally surface issues with container images that only publish amd64 — always check for multi-arch manifests before committing to a piece of software.
- → Longhorn's replication adds meaningful overhead on gigabit ethernet with Raspberry Pi CPUs. Some latency-sensitive apps are better off with local-path storage and an explicit backup strategy.
- → GitOps discipline pays off. Being able to nuke the entire cluster and re-converge to the desired state in under 30 minutes is genuinely useful when you experiment as aggressively as I do.
- → MicroSD cards fail under Kubernetes write load. USB SSDs for any stateful workload storage is non-negotiable.
- → Understanding the control loop model — how Kubernetes continuously reconciles actual state to desired state — was the biggest mental shift coming from a "run the command, check it happened" background.
05. What's Next
The immediate next step is migrating the app workloads currently running inside TrueNAS's built-in K3s to this cluster, so the NAS can be updated and rebooted independently. After that, I want to add a fifth node and experiment with multi-control-plane HA — currently a single control-plane failure would bring scheduling down until the node recovers.