Deploy & run Apache Kafka cluster as containers in Kubernetes cluster

First published: Tuesday, October 1, 2024 | Last updated: Tuesday, October 1, 2024

Learn to deploy and run Apache Kafka cluster nodes as OCI containers inside the Kubernetes cluster using Kubernetes YAML templates.


This blog post provides a complete guide on deploying and running Apache Kafka cluster nodes as OCI containers inside the Kubernetes cluster, a sophisticated container cluster and orchestrator that uses Kubernetes YAML templates.

For a high-level understanding, the architecture of Apache Kafka is based on the KRaft design, where each Kafka node can be configured as a controller or broker.

To implement the solution described in this blog post, you’ll need a Kubernetes cluster with a control plane (master node) and a minimum of 2 worker nodes. Then, you’ll need to install the Kubernetes client on your local developer machine running Windows, Mac, or Linux operating system.

Here, we proceed with an assumption that you already have a Kubernetes cluster deployed and running in the cloud/on-premise. The only thing we need to get started is the Kubernetes client. However, if the Kubernetes client is not yet installed, you can easily set it up by following the articles provided below. These articles will walk you through the steps of installing the Kubernetes client on your Windows, Mac, or Linux operating system, ensuring you’re ready to proceed with the solution.

  1. How to install and configure Git on Windows, Mac, and Linux?
  2. How to install Kubernetes client on Windows, Mac, and Linux?

Setup our Kubernetes starter-kit from GitHub

At SloopStash, we are proud to offer our own open source Kubernetes starter-kit repository on GitHub. This repository is designed to provide robust support to orchestrate and automate OCI containers running popular fullstacks, microservices, and Big Data workloads using Containerd, Docker, and Kubernetes. Additionally, we are committed to creating, uploading, and maintaining our OCI/container images for popular workloads in the Docker Hub. Our comprehensive Kubernetes starter-kit has been meticulously curated to encompass all the tools and resources necessary for orchestrating, automating, and deploying popular workloads or services.

Here, we setup our Kubernetes starter-kit which contains all code, configuration, and technical things required for deploying and running the Apache Kafka cluster nodes as OCI containers inside the Kubernetes cluster using Kubernetes YAML templates.

Windows

Below are the steps you can use to setup our Kubernetes starter-kit while using the Windows operating system.

  1. Open the Git Bash terminal in administrator mode.
  2. Execute the below commands in the Git Bash terminal to setup our Kubernetes starter-kit.
# Download Kubernetes starter-kit from GitHub to local filesystem path.
$ git clone https://github.com/sloopstash/kickstart-kubernetes.git /opt/kickstart-kubernetes

Mac

Here are the instructions for setting up our Kubernetes starter-kit on a Mac operating system.

  1. Open the terminal.
  2. Execute the below commands in the terminal to setup our Kubernetes starter-kit.
# Download Kubernetes starter-kit from GitHub to local filesystem path.
$ sudo git clone https://github.com/sloopstash/kickstart-kubernetes.git /opt/kickstart-kubernetes

# Change ownership of Kubernetes starter-kit directory.
$ sudo chown -R $USER /opt/kickstart-kubernetes

Linux

You can use the following commands to setup our Kubernetes starter-kit in any Linux-based operating system.

  1. Open the terminal.
  2. Execute the below commands in the terminal to setup our Kubernetes starter-kit.
# Download Kubernetes starter-kit from GitHub to local filesystem path.
$ sudo git clone https://github.com/sloopstash/kickstart-kubernetes.git /opt/kickstart-kubernetes

# Change ownership of Kubernetes starter-kit directory.
$ sudo chown -R $USER:$USER /opt/kickstart-kubernetes

Deploy and run Apache Kafka cluster inside Kubernetes cluster

Here, we deploy and run a single environment of Apache Kafka cluster consisting of 3 controller nodes and 3 worker nodes. Each node in the Kafka cluster runs as an OCI container inside the Kubernetes cluster. The deployment is automated and orchestrated through Kubernetes YAML templates, in which we define the Kubernetes resources such as persistent-volumes, stateful-sets, pods, services, etc., required for the Kafka cluster nodes. You can find the Kubernetes YAML templates in our Kubernetes starter-kit, providing you with everything you need to quickly spin up a functional Kafka cluster for testing and development purposes.

Each Kubernetes worker node must have atleast 3 GB RAM to avoid JVM memory pressure while running this 6-node Apache Kafka cluster.

# Store environment variables.
$ export ENVIRONMENT=stg

# Switch to Kubernetes starter-kit directory.
$ cd /opt/kickstart-kubernetes

# Store Kubernetes variables as environment variables.
$ source var/STG.env

# Add labels to Kubernetes node.
$ kubectl label nodes <KUBERNETES_MASTER_1> type=on-premise provider=host service=virtualbox region=local availability_zone=local-a
$ kubectl label nodes <KUBERNETES_WORKER_1> type=on-premise provider=host service=virtualbox region=local availability_zone=local-b node-role.kubernetes.io/worker=worker
$ kubectl label nodes <KUBERNETES_WORKER_2> type=on-premise provider=host service=virtualbox region=local availability_zone=local-c node-role.kubernetes.io/worker=worker

# Create Kubernetes namespace.
$ kubectl create namespace sloopstash-${ENVIRONMENT}-data-lake-s2

# Create directories for Kubernetes persistent-volumes on worker nodes.
$ sudo mkdir -p /mnt/sloopstash/${ENVIRONMENT}/data-lake/kafka/controller/0/data
$ sudo mkdir -p /mnt/sloopstash/${ENVIRONMENT}/data-lake/kafka/controller/0/log
$ sudo mkdir -p /mnt/sloopstash/${ENVIRONMENT}/data-lake/kafka/controller/1/data
$ sudo mkdir -p /mnt/sloopstash/${ENVIRONMENT}/data-lake/kafka/controller/1/log
$ sudo mkdir -p /mnt/sloopstash/${ENVIRONMENT}/data-lake/kafka/controller/2/data
$ sudo mkdir -p /mnt/sloopstash/${ENVIRONMENT}/data-lake/kafka/controller/2/log
$ sudo mkdir -p /mnt/sloopstash/${ENVIRONMENT}/data-lake/kafka/broker/0/data
$ sudo mkdir -p /mnt/sloopstash/${ENVIRONMENT}/data-lake/kafka/broker/0/log
$ sudo mkdir -p /mnt/sloopstash/${ENVIRONMENT}/data-lake/kafka/broker/1/data
$ sudo mkdir -p /mnt/sloopstash/${ENVIRONMENT}/data-lake/kafka/broker/1/log
$ sudo mkdir -p /mnt/sloopstash/${ENVIRONMENT}/data-lake/kafka/broker/2/data
$ sudo mkdir -p /mnt/sloopstash/${ENVIRONMENT}/data-lake/kafka/broker/2/log
$ sudo chmod -R ugo+rwx /mnt/sloopstash

# Create Kubernetes persistent-volume.
$ envsubst < persistent-volume/data-lake/kafka/controller.yml | kubectl apply -f -
$ envsubst < persistent-volume/data-lake/kafka/broker.yml | kubectl apply -f -

# Create Kubernetes config-map.
$ kubectl create configmap kafka-controller \
--from-file=workload/kafka/${DATA_LAKE_KAFKA_VERSION}/controller/conf/ \
--from-file=workload/kafka/${DATA_LAKE_KAFKA_VERSION}/controller/script/ \
--from-file=supervisor-server=workload/supervisor/conf/server.conf \
-n sloopstash-${ENVIRONMENT}-data-lake-s2
$ kubectl create configmap kafka-broker \
--from-file=workload/kafka/${DATA_LAKE_KAFKA_VERSION}/broker/conf/ \
--from-file=workload/kafka/${DATA_LAKE_KAFKA_VERSION}/broker/script/ \
--from-file=supervisor-server=workload/supervisor/conf/server.conf \
-n sloopstash-${ENVIRONMENT}-data-lake-s2

# Create Kubernetes service.
$ kubectl apply -f service/data-lake/kafka.yml -n sloopstash-${ENVIRONMENT}-data-lake-s2

# Create Kubernetes stateful-set.
$ envsubst < stateful-set/data-lake/kafka/controller.yml | kubectl apply -f - -n sloopstash-${ENVIRONMENT}-data-lake-s2
$ envsubst < stateful-set/data-lake/kafka/broker.yml | kubectl apply -f - -n sloopstash-${ENVIRONMENT}-data-lake-s2

Similarly, our Kubernetes starter-kit enables you to deploy and manage multiple environments of the Apache Kafka cluster by modifying the relevant environment variables. Detailed instructions for this process can be found in our Kubernetes starter-kit wiki on GitHub. The wiki also includes information on testing and verifying the Kafka cluster nodes running as OCI containers in the Kubernetes cluster.