The Curious Dev

Set up Kubernetes cluster with Ansible (Part 1)

I recently got my homelab setup with Proxmox hypervisor and decided to create to a kubernetes cluster. I am using an Intel NUC 13 with with 16 cores for my homelab. One issue I quickly faced is how to automate installing the necessary software packages to get Kubernetes up and running. Logging into each individual node is just out of the question. I am a lazy developer lol and automation is my best friend. This is where Ansible comes in handy.

What is Ansible? It is an agentless tool for managing remote nodes. Note that Ansible comes into play after the infrastructure has already been provisioned with tools like Terraform or Vagrant or cloudinit etc. One thing I love about Ansible is it does not require an agent to be running on the remote nodes like Puppet and Chef.

So how does Ansible work? First, you need an inventory file (inventory.ini) which defines the list IP addresses or hostnames of the remote machines you want to manage. Here is an example:

[azure_nodes]
testvm1.azure.net
testvm2.azure.net
testvm3.azure.net

[homelab_nodes]
10.0.0.122
10.0.0.123
10.0.0.124
10.0.0.125

You can also put them in groups like this:

[azure_control_plane]
testvm1.azure.net

[azure_worker_nodes]
testvm2.azure.net
testvm3.azure.net

[azure_nodes:children]
azure_control_plane
azure_worker_nodes

...

Verify ansible is able to connect to the nodes with the following command by pinging each host. You can choose to ping a group of hosts rather than all hosts defined in the file and use a different user with the -u flag.

ansible -m ping azure_nodes --private-key=~/.ssh/id_azure -u dev -i inventory.ini 

Screenshot

Next you need a playbook. No, not that playbook that NFL teams use. A playbook is where you define the list of commands to run, in what order and on what machines. It is declarative meaning it represents the final state of the machine as opposed to imperative like Bash or Powershell scripts where you provide step-by-step instructions on how to get to the final state. I won’t dive too deep into Ansible as there is a lot of information already available on the internet. It follows a simple structure where you define a host and a list of tasks to perform. Each task can run a module which I think is ansible’s way of providing a cross-platform way to run Linux commands in a declarative way. You can still opt to run a shell command using the shell module but if Ansible already provides a module for what you’re doing, I would encourage you to use it.

Let’s start with a playbook that has to run on all nodes. We’ll call this all-playbook.yml. My playbook has been tested on Debian Bookworm version 12 but I’m sure it will run on any system using apt for package management. The first thing I do is to make sure the remote machines are running Debian 12 before installing the required packages for Kubernetes.

---
- hosts: azure_nodes
  become: true
  tasks:
  - name: Fail if playbook is not running on Debian machines
    fail:
      msg: "This playbook is not meant to be run on the target host. Run it from your local machine."
    when: ansible_distribution != 'Debian' or ansible_distribution_major_version != "12"

  - name: Install required packages
    apt:
        name: "{{ packages }}"
        state: present
        update_cache: yes 
        purge: true
    vars:
        packages:
        - apt-transport-https
        - ca-certificates
        - curl
        - software-properties-common
        - gnupg-agent
        - gnupg2
        - vim
        - ufw
        - git
        - jq
        - build-essential

I want to point out the use of ansible_distribution and ansible_distribution_major_version. These are something Ansible calls facts about the remote machine. It is just data or information related to the system. You can access them all in the playbook using the ansible_ prefix or run ansible azure_nodes -m setup to get a list of all system facts.

Next disable swap and load the overlay and br_netfilter kernel modules

 - name: Disable swap on all nodes
    command: swapoff -a
  
  - name: Remove swap from /etc/fstab
    command: sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
  
  - name: Set kernel parameters on all nodes
    blockinfile:
      create: true
      path: /etc/modules-load.d/k8s.conf
      block: |
        overlay
        br_netfilter        
  
  - name: Load kernel modules
    command: modprobe {{ item }}
    with_items:
      - overlay
      - br_netfilter

  - name: Sysctl parameters required for Kubernetes networking
    lineinfile:
      create: true
      path: /etc/sysctl.d/k8s.conf
      line: "{{ item }}"
    with_items:
      - 'net.bridge.bridge-nf-call-iptables  = 1'
      - 'net.bridge.bridge-nf-call-ip6tables = 1'
      - 'net.ipv4.ip_forward                 = 1'

  - name: Apply sysctl parameters
    command: sysctl --system

Notice the modules blockinfile and lineinfile could have been achieved through a shell command with cat or something similar. Using the builtin modules make the yaml file more readable in my opinion.

Now unto Docker and Kubernetes. The steps are similar as we first need to add the apt repository and official GPG keys before grabbing the packages.

  - name: Add Docker repository
    apt_repository:
      repo: "deb [arch={{ 'amd64' if ansible_architecture == 'x86_64' else 'arm64' }} signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian {{ ansible_distribution_release }} stable"
      filename: docker
      update_cache: yes

  - name: Install Docker and Containerd
    apt:
      name: "{{ packages }}"
      update_cache: yes
    vars:
      packages:
      - docker-ce
      - docker-ce-cli
      - containerd.io
      - docker-buildx-plugin
      - docker-compose-plugin

  - name: Add Kubernetes official GPG key
    get_url:
      url: https://pkgs.k8s.io/core:/stable:/v1.29/deb/Release.key
      dest: /etc/apt/keyrings/kubernetes-apt-keyring.asc
      mode: 0644

  - name: Add Kubernetes repository
    apt_repository:
      repo: "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.asc] https://pkgs.k8s.io/core:/stable:/v1.29/deb/ /"
      filename: kubernetes
      update_cache: yes

  - name: Install Kubernetes packages
    apt:
      name: "{{ packages }}"
      update_cache: yes
    vars:
      packages:
      - kubelet=1.29.3-1.1
      - kubeadm=1.29.3-1.1
      - kubectl=1.29.3-1.1

We just have two more steps left. Bear with me. Kubernetes v1.29 needs Container Networking Plugin (CNI) for cluster networking. The CNI plugin is what actually implements the Kubernetes network model which imposes the following rules straight from the documentation:

  1. Pods can communicate with all other pods on any other node without NAT
  2. Agents on a node (e.g. system daemons, kubelet) can communicate with all pods on that node
  - name: Install Kubernetes CNI plugins
    unarchive:
      src: "https://github.com/containernetworking/plugins/releases/download/v1.4.0/cni-plugins-linux-{{ 'amd64' if ansible_architecture == 'x86_64' else 'arm64' }}-v1.4.0.tgz"
      dest: /opt/cni/bin
      remote_src: true
      mode: '0755'

The last step is to make sure containerd uses systemd as the Cgroup driver. Cgroups (short for control group) is a kernel feature that allows admins to impose a set of limits on resources allocated to a process. It is more complicated than that and I honestly don’t know the intricate details. Starting from Kubernetes v1.22, the kubelet process running on the nodes will use systemd cgroup driver by default if it is not explicitly set. We just need to make sure our container runtime is also using systemd so they both have the same view of the resources in the system and restart.

  - name: Use systemd Cgroup driver for containerd
    shell: |
      containerd config default > /etc/containerd/config.toml
      sed -i 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml      

  - name: Restart containerd
    systemd:
      name: containerd
      state: restarted
      enabled: yes
      daemon_reload: yes

We can finally run our playbook with the following command:

ansible-playbook --private-key=~/.ssh/id_azure -u dev -i inventory.ini -f 7 all-playbook.yml

Note the -f 5 parameter. This controls the level of parallelism or forks that Ansible uses. If you have the compute power and also have many nodes, you can experiment with this number and set it to a higher value.

I want to keep this short (well kind of) and so in the next article, let’s setup the control plane and the worker nodes.

Next article in series: Set up Kubernetes cluster with Ansible (Part 2)