Hosting k3s for pennies on Hetzner Cloud

14 minute read Published: 2025-06-18

If you are like me, you keep reading and hearing about Kubernetes and how it is the future of deploying applications, be it small little apps or the next billion dollar SaaS app. But I never really had the opportunity to collect hands-on experience on how to actually get things done with kubernetes. I know that basically every major cloud provider offers some form of managed Kubernetes, but I never had the feeling that I would understand the topic well by relying on them. That brought me to an idea. The Hetzner cloud always has been dear to my heart, because they offer you some of the best cloud services in regard to how much bang you get for your buck. On the other hand, one of the frequent complaints people express towards Hetzner is that their cloud is rather barebones when compared to other offerings. So I got the Idea: What if roll my own Kubernetes-Cluster on Hetzner VPS hosts and jump through all the hoops of maintaining and using Kubernetes?

The code for this post is hosted on Codeberg, feel free to take a look!

Architecture

For simplicity, I decided to go with a simple three node setup for my cluster. Every node is a control plane node, hosting etcd and the kubernetes API. This allows for High-Availability in the sense that one node of my cluster can fail without the cluster losing its ability to work.

Architecture diagram showing a three node cluster

Choosing a Kubernetes Distribution

Similiar to how Linux comes in many different flavours you can choose from via distributions, Kubernetes also comes in many different editions, all with their own tradeoffs and benefits. There is vanilla Kubernetes, which requires you to make many decisions. It is generally not recommended to roll vanilla Kubernetes for production scenarios, but it can provide great value if used for educational purposes. At first tempted, I ultimately decided against vanilla Kubernetes. I then took a look at the two main Kubernetes setups in the enterprise context - Red Hat's OpenShift and SuSE Rancher. OpenShift requires a Redhat Subscription, but you can use the open source Version OKD instead. Initially, I wanted to use OKD, but the lack of documentation and good examples made me give up really quickly. But that was okay, as the really heavy-weight nature of OKD wasn't to my liking. Rancher is really similar to OKD / OpenShift, so I wasn't really keen on trying that too. But while reading about Rancher, I stumbled upon k3s, which markets itself as "a lightweight kubernetes distribution [...]". The documentation was really pleasant to read and while researching, it appeared that k3s enjoys a really good reputation. So I decided to use k3s for my cluster setup.

Creating the cluster nodes

With a distribution selected, it was now time to get something deployed to Hetzner. I opted into using OpenTofu to manage my infrastructure in a way that can be stored in Git. But before we can deploy any compute, we need to setup some prerequisites. The nodes need to be provisioned with a ssh-key to allow for troubleshooting and retrieving configuration files from the nodes. So we need to generate a key and upload it to Hetzner cloud so it can be used by the nodes. That can be achieved by combining the hcloud provider with the tls provider, available in the OpenTofu-registry:

resource "tls_private_key" "master_node_key" {
  algorithm = "ED25519"
}

resource "hcloud_ssh_key" "hcloud_master_node_key" {
  name       = "${var.hcloud_namespace}-master-key"
  public_key = tls_private_key.master_node_key.public_key_openssh
}

We generate a ED25519 keypair and then use the hcloud provider to upload that to Hetzner. Pretty straightforward. The next thing I wanted to do is setup an internal network for cluster communication. This can be achieved similarly with terraform:

resource "hcloud_network" "cluster_network" {
  name     = "${var.hcloud_namespace}-cluster-net"
  ip_range = var.ip_range
}

resource "hcloud_network_subnet" "cluster_network_subnet" {
  type         = "cloud"
  network_id   = hcloud_network.cluster_network.id
  network_zone = var.network_zone
  ip_range     = var.ip_subnet
}

With that done, all prerequisites are there for deploying some nodes. This needs to be done in two steps: first, a single node is deployed that initializes the cluster:

resource "hcloud_server" "master_node" {
  name        = "${var.hcloud_namespace}-master-1"
  image       = "ubuntu-24.04"
  server_type = "cax11"
  location    = var.hcloud_location
  ssh_keys    = [var.master_key_id]
  public_net {
    ipv4_enabled = true
    ipv6_enabled = false
  }
  network {
    network_id = var.cluster_network_id
    ip         = "10.0.1.1"
  }
  user_data = templatefile("${path.module}/../../scripts/cloud-init-master.tpl", {})
}

This creates a VPS of the cax11 type in the selected region and provisions it with ubuntu and the previously created ssh key. It also puts it into the network we have created in the step before. The cluster initialization happens in the user-data script that gets passed to the server:

packages:
  - curl
users:
  - name: cluster
    sudo: ALL=(ALL) NOPASSWD:ALL
    shell: /bin/bash
runcmd:
  - apt update -y
  - curl https://get.k3s.io | INSTALL_K3S_EXEC="--cluster-init --disable traefik --disable-cloud-controller --kubelet-arg cloud-provider=external --tls-san 10.0.1.1" sh -
  - chown cluster:cluster /etc/rancher/k3s/k3s.yaml
  - chown cluster:cluster /var/lib/rancher/k3s/server/node-token

The user-data script creates the cluster user with sudo rights and then proceeds to install k3s via the official install script. The flags set are important:

--cluster-init initializes a cluster with etcd and everything needed for nodes to join.
--disable traefik disables the default ingress used by k3s. This is not strictly necessary, I just need to do that as I wish to use NGINX as ingress later on.
--disable-cloud-controller important so that the Hetzner hcloud cloud controller manager can be used for leveraging existing Hetzner resources.
--kubelet-arg cloud-provider=external specifies that we use an external cloud controller manager
--tls-san 10.0.1.1 required because by default, the k3s install script only puts the public IP in the subject of the TLS certificate. This makes sure that communication works via TLS on the private network.

After that, more nodes can be added to the cluster in a very similar way. In Terraform, I deploy more VPS nodes:

resource "hcloud_server" "additional_master_nodes" {
  count       = 2
  name        = "${var.hcloud_namespace}-master-${count.index + 2}"
  server_type = "cax11"
  image       = "ubuntu-24.04"
  location    = var.hcloud_location
  public_net {
    ipv4_enabled = true
    ipv6_enabled = false
  }
  network {
    network_id = var.cluster_network_id
  }
  user_data = templatefile("${path.module}/../../scripts/cloud-init-worker.tpl", {
    master_private_key = base64encode(var.master_private_key)
    master_node_ip     = "10.0.1.1"
  })
  depends_on = [hcloud_server.master_node]
}

The setup is almost the same as the one for the initial node, with the exception that a different user-data script is used and the private key for the initial master node is given as input. The reason why that is necessary is visible in the user data script:

packages:
  - curl
users:
  - name: cluster
    sudo: ALL=(ALL) NOPASSWD:ALL
    shell: /bin/bash
write_files:
  - path: /root/.ssh/id_ed25519
    encoding: b64
    content: "${master_private_key}"
    permissions: "0600"
runcmd:
  - apt update -y
  - until curl -k https://${master_node_ip}:6443; do sleep 5; done
  - REMOTE_TOKEN=$(ssh -o StrictHostKeyChecking=accept-new root@${master_node_ip} sudo cat /var/lib/rancher/k3s/server/node-token)
  - curl -sfL https://get.k3s.io | K3S_TOKEN=$REMOTE_TOKEN INSTALL_K3S_EXEC="server --server https://${master_node_ip}:6443 --token $REMOTE_TOKEN --disable traefik --disable-cloud-controller --kubelet-arg cloud-provider=external" sh -

Similarly, a cluster user is created and the k3s installer gets downloaded and executed. But before that, we use ssh to connect with the initial master node and retrieve the token that is required to join the cluster. I do not know if that is a good solution, but it works for now and I couldn't come up with a better solution up until now. We should now be able to see our cluster in action when we deploy our resources. kubectl can be used to verify that:

kubectl get nodes

You should see three nodes that all have the roles control-plane, etcd and master:

NAME                    STATUS   ROLES                       AGE    VERSION
codeberg-k3s-master-1   Ready    control-plane,etcd,master   6d4h   v1.32.5+k3s1
codeberg-k3s-master-2   Ready    control-plane,etcd,master   6d4h   v1.32.5+k3s1
codeberg-k3s-master-3   Ready    control-plane,etcd,master   6d4h   v1.32.5+k3s1

This means our cluster is alive and kicking and we can now start using kubernetes to further interact with it!

Ingress and Cloud Controller Manager

The first things I wanted to get deployed into the cluster were NGINX as my ingress controller and the hcloud cloud controller manager. Because we now have a working kubernetes cluster, we do not have to fiddle around with a mix of terraform, bash and cloud-init: we can now fiddle around with a mix of terraform, kubernetes and helm. For that, I added the providers for helm and kubernetes and began writing terraform to get the stuff into my cluster. I began with the CCM. Hetzner's hcloud CCM requires that a valid token for your hcloud account is available as a cluster secret. Achieving that is trivial with terraform and the Kubernetes provider:

resource "kubernetes_secret" "hcloud_token" {
  metadata {
    name      = "hcloud"
    namespace = "kube-system"
  }
  data = {
    token = var.hcloud_token
  }
  type = "Opaque"
}

This reads the token from the terraform variables and uploads it to the cluster for later use. When that is done, the installation of hcloud CCM is straight forward with helm:

resource "helm_release" "hcloud_ccm" {
  name       = "hcloud-cloud-controller-manager"
  namespace  = "kube-system"
  repository = "https://charts.hetzner.cloud"
  chart      = "hcloud-cloud-controller-manager"
  version    = "1.25.1"

  values = [
    yamlencode({
      controller = {
        enabled = true
      }
      cloudControllerManager = {
        enabled     = true
        hcloudToken = var.hcloud_token
      }
    })
  ]

  timeout       = 300
  atomic        = true
  recreate_pods = true
  force_update  = true
}

This simply runs the helm chart specified with the inputs given. Note that I increased the timeout because, depending on what is currently going on in the cluster, it could take a moment until the chart is up and running. With that, Hetzner CCM is deployed and ready for usage. The NGINX ingress controller is similarily easy. You just apply the corresponding helm chart:

resource "helm_release" "nginx_ingress" {
  name             = var.name
  namespace        = var.namespace
  create_namespace = true
  repository       = "https://kubernetes.github.io/ingress-nginx"
  chart            = "ingress-nginx"
  version          = var.chart_version

  values = [
    yamlencode({
      controller = {
        replicaCount = 2
        publishService = {
          enabled = true
        }
        service = {
          type                  = "LoadBalancer"
          externalTrafficPolicy = "Local"
        }
      }
    })
  ]

  timeout       = 300
  atomic        = true
  recreate_pods = true
  force_update  = true
}

It is basically the same procedure as with the CCM in the step before. In order to use NGINX as ingress, you need to define a service, which is also easily done in Terraform:

data "kubernetes_service" "nginx_ingress" {
  metadata {
    name      = "nginx-ingress-ingress-nginx-controller"
    namespace = var.namespace
  }

  depends_on = [helm_release.nginx_ingress]
}

With that, we are almost ready to start deploying real applications, the only thing that needs to be taken care of beforehand is how the cluster handles certificates and TLS. Luckily, there is a battle-tested solution that can help us.

Cert-Manager and TLS

As is custom with everything Kubernetes, I want the process of getting certificates for my services to be as automated as possible. Ideally, I don't want to think about it at all after configuring it, I want that services can just request a cert and automagically get assigned one. There is a solution that can provide just that called cert-manager. cert-manager creates certificates as they are requested and maintains them, ensuring they don't expire unexpectedly. For it to work, you need to configure an Issuer, which is simply a source of certificates. By default, cert-manager hands out self-signed certs. The interesting bit for me is that cert-manager fully supports the ACME protocol, which means I can use LetsEncrypt to fetch certs for my domain that work out of the box. It supports both http01 and dns01 as challenges. I decided to go for dns01, which does not require having a working http server to solve the challenge and obtain a cert. Instead, you verify your ownership of the domain by manipulating your domains DNS records. For that, you need to have your domain hosted on a supported provider and put a token for accessing their API into your secret store. As I use Cloudflare, I simply put the token into my cluster as I did before for the CCM:

resource "kubernetes_secret" "cloudflare_api_token" {
  metadata {
    name = "cloudflare-api-token-secret"
    namespace = var.namespace
  }

  type = "Opaque"

  data = {
    token = var.cloudflare_token
  }
}

Deploying cert-manager is as simple as the other things we did, as cert-manager also provides a helm chart:

resource "helm_release" "cert_manager" {
  name             = var.name
  namespace        = var.namespace
  create_namespace = true
  repository       = "https://charts.jetstack.io"
  chart            = "cert-manager"
  version          = var.chart_version

  set = [
    {
      name  = "crds.enabled"
      value = "true"
    }
  ]

  timeout       = 300
  atomic        = true
  recreate_pods = true
  force_update  = true
}

With that, we are almost ready to issue certificates for the cluster. The only thing left is to configure a ClusterIssuer. This happens via a plain old manifest:

resource "kubernetes_manifest" "cluster_issuer" {
  manifest = {
    apiVersion = "cert-manager.io/v1"
    kind       = "ClusterIssuer"
    metadata = {
      name = var.issuer_name
    }
    spec = {
      acme = {
        server = "https://acme-v02.api.letsencrypt.org/directory"
        email  = var.certificate_email
        privateKeySecretRef = {
          name = "${var.issuer_name}-account-key"
        }
        solvers = [
          {
            dns01 = {
              cloudflare = {
                email = var.cloudflare_email
                apiTokenSecretRef = {
                  name = kubernetes_secret.cloudflare_api_token.metadata[0].name
                  key  = "token"
                }
              }
            }
          }
        ]
      }
    }
  }
  depends_on = [kubernetes_secret.cloudflare_api_token]
}

The interesting part is the spec, where the ACME server gets configured and the associated mail and token get set. With that, your cluster is ready to provision TLS certificates fully automated and without human action, given the configuration is correct. With that, the most important things are ready to go, which means I can now finally deploy the application that made me try kubernetes in the first place.

Deploying ArgoCD

ArgoCD is the one thing that made me do all this in the first place. I want to use Argo in order to implement the GitOps paradigm. When doing GitOps, your whole infrastructure is defined in plain text files that are under version control. The repo is the single source of truth, changes to the infrastructure are equivalent to modifying the files in the repo. ArgoCD is a tool which helps achieving exactly that, by monitoring one or multiple repos for changes and making sure that the cluster state reflects the state in the repo. As you are most likely acustomed to by now, Argo gets deployed by a helm chart too:

resource "helm_release" "argocd" {
  name             = var.name
  namespace        = var.namespace
  create_namespace = true
  repository       = "https://argoproj.github.io/argo-helm"
  chart            = "argo-cd"
  version          = var.chart_version

  set = [
    {
      name  = "global.domain"
      value = var.domain
    },
    {
      name  = "server.certificate.enabled"
      value = "true"
    },
    {
      name  = "server.extraArgs"
      value = "{--insecure}"
    },
    {
      name  = "server.certificate.domain"
      value = var.domain
    },
    {
      name  = "server.certificate.issuer.kind"
      value = "ClusterIssuer"
    },
    {
      name  = "server.certificate.issuer.name"
      value = var.issuer_name
    },
    {
      name  = "server.ingress.enabled"
      value = "true"
    },
    {
      name  = "server.ingress.tls"
      value = "true"
    },
    {
      name  = "server.ingress.ingressClassName"
      value = "nginx"
    },
    {
      name  = "server.ingress.annotations.nginx\\.ingress\\.kubernetes\\.io/ssl-redirect"
      value = "true"
    },
    {
      name  = "configs.secret.create"
      value = "true"
    },
    {
      name  = "configs.secret.data.admin\\.passwordMtime"
      value = timestamp()
    }
  ]

  set_sensitive = [
    {
      name  = "configs.secret.data.admin\\.password"
      value = var.argo_cd_password_bcrypt
    }
  ]

  timeout       = 300
  atomic        = true
  recreate_pods = true
  force_update  = true
}

This deploys argo into a dedicated namespace in the cluster, obtains a certificate and configures an ingress, all fully automated. After running this, I have Argo up and running on my cluster, publicly available and with a valid TLS-Certificate. Woohoo! Although kubernetes was a hassle at some points while making this, the result of having it deploy with so many things just happening automatically is very cool.

Costs of the Setup

We are running three nodes of the CAX11 flavour. These are well suited because k3s is optimized for running on ARM based hosts and they are really cheap. At the time of writing this post, a CAX11 instance costs only 3.92€, which makes the deployed cluster clock in at just below 12€ a month, which is really good. The cost could increase a bit later down the line, if you opt into using other Hetzner resources, e.g. using Storage Boxes for your container storage provider, but it should stay manageable.

Further Plans

I am currently satisfied with my setup. In the next steps, I want to improve some general things about my cluster:

switch to a distro optimized for container workloads, preferably immutable by default
harden the setup so the security is sensible
make the deployment a bit simpler (you currently have to run multiple terraform projects in a certain sequence for everything to work)

I also want to leverage the existing ArgoCD as a way to bootstrap my GitOps repo, so that really everything running on the cluster is defined in Git. But that will have to wait until I have time again.

Table of Contents