Running Kafka on kubernetes for local development
Marcos Maia
Posted on January 11, 2022
In this post I will cover the steps to run Kafka locally on your development machine using Kubernetes. Usually I go with docker-compose for that as it is simpler to get it going but due to the omnipresence of Kubernetes these days in companies I decided to also port my local Confluent Kafka setup to run on Kubernetes for practicing and getting closer to production environments where my application will most likely also run on Kubernetes.
If you don't need strict control over the Kafka and Zookeeper storage locations on your developer host machine which is most of the time the case, I would recommend that you use the Storage class setup approach because it's simpler and uses less code as it leverages the default storage provider used by Kind.
This is a minimal local Kafka setup meant for development so it's a single instance of Kafka, Schema-registry and Zookeeper. You may want to run a local kafka for quick and easy test and prototyping and also to save money from running it in the cloud just for functional development, if you combine that with also packaging and running your application on Kubernetes you have a very efficient setup that closely mirrors cloud environments where your app and the Kubernetes cluster you will use for the environments are probably running.
Strimzi is an awesome simpler alternative to achieve the same, check it out. My goal with this setup here is for learning and to have a more "realistic" Kubernetes setup on local development machine so I opted to not use Strimzi or Helm charts.
In the past couple of days I came up with two local setups running Kafka, Schema-registry and Zookeeper on local development machine with Kubernetes using Kind, in this first post I will cover a setup using Persistent Volumes and Persistent Volume Claims and in the next one I will cover using Storage classes.
I have created and tested these approaches on a Linux Development machine. It should work for Mac and Windows also but I have never tried it.
You can get the full source from Github repo where you will find the files and Quick Start for both aforementioned approaches. To clone the repo git clone git@github.com:mmaia/kafka-local-kubernetes.git
.
Well I guess this is more than enough introduction let's have some fun.
Pre-reqs, install:
The setup using Persistent Volumes and Persistent Volume Claims
If you checked out the repo described above the setup presented here is under pv-pvc-setup
folder. You will find multiple Kubernetes declarative files in this folder, please notice that you could also combine all files in a single one separating them with a line containing triple dashes ---
, if combining them is your preference you can open a terminal and from the pv-pvc-setup folder run for each in ./kafka-k8s/*; do cat $each; echo "---"; done > local-kafka-combined.yaml
this will concatenate all files in a single one called local-kafka-combined.yaml.
I keep them separate to explicitly separate each type in this case and because it's convenient as you can just run kubectl pointing to the directory as described below in the "Running it" section.
kind-config.yaml
- This file configures Kind to expose the kafka and schema-registry ports to the local machine host so you can connect your application while developing from your IDE or command line and connect with Kafka running on Kubernetes.
apiVersion: kind.x-k8s.io/v1alpha4
kind: Cluster
nodes:
- role: control-plane
- role: worker
extraPortMappings:
- containerPort: 30092 # internal kafka nodeport
hostPort: 9092 # port exposed on "host" machine for kafka
- containerPort: 30081 # internal schema-registry nodeport
hostPort: 8081 # port exposed on "host" machine for schema-registry
extraMounts:
- hostPath: ./tmp/kafka-data
containerPath: /var/lib/kafka/data
readOnly: false
selinuxRelabel: false
propagation: Bidirectional
- hostPath: ./tmp/zookeeper-data/data
containerPath: /var/lib/zookeeper/data
readOnly: false
selinuxRelabel: false
propagation: Bidirectional
- hostPath: ./tmp/zookeeper-data/log
containerPath: /var/lib/zookeeper/log
readOnly: false
selinuxRelabel: false
propagation: Bidirectional
Notice the mapping from the internal container paths to the external hostPath
on the local machine. The local paths will need to be manually created before running the setup as per instructions on the section "Running it" below.
This is it for the Kind configuration now let's check the Kubernetes files(under kafka-kl8s if you checked out the project):
kafka-deployment.yaml
- Configures the kafka broker and exposes an internal(for kubernetes network) and external port(for kafka clients) for kafka also maps an internal volume to expose kafka data files.
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
service: kafka
name: kafka
spec:
replicas: 1
selector:
matchLabels:
service: kafka
strategy:
type: Recreate
template:
metadata:
labels:
network/kafka-network: "true"
service: kafka
spec:
enableServiceLinks: false
containers:
- name: kafka
imagePullPolicy: IfNotPresent
image: confluentinc/cp-kafka:7.0.1
ports:
- containerPort: 29092
- containerPort: 9092
env:
- name: CONFLUENT_SUPPORT_CUSTOMER_ID
value: "anonymous"
- name: KAFKA_ADVERTISED_LISTENERS
value: "INTERNAL://kafka:29092,LISTENER_EXTERNAL://kafka:9092"
- name: KAFKA_AUTO_CREATE_TOPICS_ENABLE
value: "true"
- name: KAFKA_BROKER_ID
value: "1"
- name: KAFKA_DEFAULT_REPLICATION_FACTOR
value: "1"
- name: KAFKA_INTER_BROKER_LISTENER_NAME
value: "INTERNAL"
- name: KAFKA_LISTENERS
value: "INTERNAL://:29092,LISTENER_EXTERNAL://:9092"
- name: KAFKA_LISTENER_SECURITY_PROTOCOL_MAP
value: "INTERNAL:PLAINTEXT,LISTENER_EXTERNAL:PLAINTEXT"
- name: KAFKA_NUM_PARTITIONS
value: "1"
- name: KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR
value: "1"
- name: KAFKA_LOG_CLEANUP_POLICY
value: "compact"
- name: KAFKA_ZOOKEEPER_CONNECT
value: "zookeeper:2181"
resources: {}
volumeMounts:
- mountPath: /var/lib/kafka/data
name: kafka-data
hostname: kafka
restartPolicy: Always
volumes:
- name: kafka-data
persistentVolumeClaim:
claimName: kafka-pvc
kafka-network-np.yaml
- Sets up the internal Kubernetes network used by the setup.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: kafka-network
spec:
ingress:
- from:
- podSelector:
matchLabels:
network/kafka-network: "true"
podSelector:
matchLabels:
network/kafka-network: "true"
kafka-pv.yaml
- this file describes the persistent volume used.
apiVersion: v1
kind: PersistentVolume
metadata:
name: kafka-pv
spec:
accessModes:
- ReadWriteOnce
storageClassName: kafka-local-storage
capacity:
storage: 5Gi
persistentVolumeReclaimPolicy: Retain
hostPath:
path: /var/lib/kafka/data
kafka-pvc.yaml
- this file is the claim used by the 'pod' as described in the deployment file for kafka above and associated to the persistent volume.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: kafka-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: kafka-local-storage
resources:
requests:
storage: 5Gi
kafka-service.yaml
- This file defines the mappings between the internal containers and ports that are exposed, called NodePorts in Kubernetes.
apiVersion: v1
kind: Service
metadata:
labels:
service: kafka
name: kafka
spec:
selector:
service: kafka
ports:
- name: internal
port: 29092
targetPort: 29092
- name: external
port: 30092
targetPort: 9092
nodePort: 30092
type: NodePort
The remaining files are declarative Kubernetes configurations files to schema-registry and zookeeper.
schema-registry-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
service: schema-registry
name: schema-registry
spec:
replicas: 1
selector:
matchLabels:
service: schema-registry
strategy: {}
template:
metadata:
labels:
network/kafka-network: "true"
service: schema-registry
spec:
enableServiceLinks: false
containers:
- env:
- name: SCHEMA_REGISTRY_HOST_NAME
value: "schema-registry"
- name: SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS
value: "kafka:29092"
- name: SCHEMA_REGISTRY_LISTENERS
value: "http://0.0.0.0:30081"
image: confluentinc/cp-schema-registry:7.0.1
name: schema-registry
ports:
- containerPort: 30081
resources: {}
hostname: schema-registry
restartPolicy: Always
schema-registry-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
service: schema-registry
name: schema-registry
spec:
ports:
- port: 30081
name: outport
targetPort: 30081
nodePort: 30081
type: NodePort
selector:
service: schema-registry
zookeeper-data-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: zookeeper-data-pv
spec:
accessModes:
- ReadWriteOnce
storageClassName: zookeeper-data-local-storage
capacity:
storage: 5Gi
persistentVolumeReclaimPolicy: Retain
hostPath:
path: /var/lib/zookeeper/data
zookeeper-data-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: zookeeper-data-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: zookeeper-data-local-storage
resources:
requests:
storage: 5Gi
zookeeper-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
service: zookeeper
name: zookeeper
spec:
replicas: 1
selector:
matchLabels:
service: zookeeper
strategy: {}
template:
metadata:
labels:
network/kafka-network: "true"
service: zookeeper
spec:
containers:
- env:
- name: TZ
- name: ZOOKEEPER_CLIENT_PORT
value: "2181"
- name: ZOOKEEPER_DATA_DIR
value: "/var/lib/zookeeper/data"
- name: ZOOKEEPER_LOG_DIR
value: "/var/lib/zookeeper/log"
- name: ZOOKEEPER_SERVER_ID
value: "1"
image: confluentinc/cp-zookeeper:7.0.1
name: zookeeper
ports:
- containerPort: 2181
resources: {}
volumeMounts:
- mountPath: /var/lib/zookeeper/data
name: zookeeper-data
- mountPath: /var/lib/zookeeper/log
name: zookeeper-log
hostname: zookeeper
restartPolicy: Always
volumes:
- name: zookeeper-data
persistentVolumeClaim:
claimName: zookeeper-data-pvc
- name: zookeeper-log
persistentVolumeClaim:
claimName: zookeeper-log-pvc
zookeeper-log-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: zookeeper-log-pv
spec:
accessModes:
- ReadWriteOnce
storageClassName: zookeeper-log-local-storage
capacity:
storage: 5Gi
persistentVolumeReclaimPolicy: Retain
hostPath:
path: /var/lib/zookeeper/log
zoopeeker-log-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: zookeeper-log-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: zookeeper-log-local-storage
resources:
requests:
storage: 5Gi
zookeeper-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
service: zookeeper
name: zookeeper
spec:
ports:
- name: "2181"
port: 2181
targetPort: 2181
selector:
service: zookeeper
Running it
After cloning the project open a terminal and cd to the pv-pvc-setup folder. Or if you're creating the files navigate to folder where the kind-config.yaml is located.
Create the folders on your local host machine so the persistent volumes can be persisted to the file system and you
will be able to restart the kafka and zookeeper cluster without loosing data from topic. Restarting the kind cluster
will delete the contents of persistent volumes, this is by design how Kind works when having
apropagation: Bidirectional
configuration. To create the folders you'll need to haveuid
andgid
from same
user running kind cluster otherwise the persistent folders will not be properly persisted. i.e - make sure to
createtmp/kafka-data
,tmp/zookeeper-data/data
andtmp/zookeeper-data/log
from same level where thekind-config.yaml
file you're running kind with or it won't work as expected.Run kind specifying configuration:
kind create cluster --config=kind-config.yml
. This will start a kubernetes
control plane + worker. Check Kind docker containers with Kubernetes control-plane and worker runningdocker ps
.Run kubernetes configuration for kafka
kubectl apply -f kafka-k8s
When done stop kubernetes objects:
kubectl delete -f kafka-k8s
and then if you want also stop the kind cluster, be aware that this will also delete the local storage on the host machine:kind delete cluster
If you have problems to connect your kafka client from your local development machine to the broker running in docker or Kubernetes please check the section
Connecting a Kafka client
on the end of this other post where you'll find details on how to fix it and links explaining why it happens and sort this out completely, have fun.
That's all for now folks.
Stay tunned as I will post about a simpler approach using the default Storage Class automatically provisioned by Kind (Rancher/local-path-provisioner) which simplifies the setup considerably with the trade off of not having so much control over the host local storage where kafka and zookpeeper files are stored on the host machine.
Photo by Fotis Fotopoulos on Unsplash
Posted on January 11, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.