Kafka on Kubernetes, the Strimzi way! (Part 4)
Abhishek Gupta
Posted on July 28, 2020
Welcome to part four of this blog series! So far, we have a Kafka single-node cluster with TLS encryption on top of which we configured different authentication modes (TLS
and SASL SCRAM-SHA-512
), defined users with the User Operator, connected to the cluster using CLI and Go clients and saw how easy it is to manage Kafka topics with the Topic Operator. So far, our cluster used ephemeral
persistence, which in the case of a single-node cluster, means that we will lose data if the Kafka or Zookeeper nodes (Pod
s) are restarted due to any reason.
Let's march on! In this part we will cover:
- How to configure Strimzi to add persistence for our cluster:
- Explore the components such as
PersistentVolume
andPersistentVolumeClaim
- How to modify the storage quality
- Try and expand the storage size for our Kafka cluster
The code is available on GitHub - https://github.com/abhirockzz/kafka-kubernetes-strimzi/
What do I need to go through this tutorial?
kubectl
- https://kubernetes.io/docs/tasks/tools/install-kubectl/
I will be using Azure Kubernetes Service (AKS) to demonstrate the concepts, but by and large it is independent of the Kubernetes provider. If you want to use AKS
, all you need is a Microsoft Azure account which you can get for FREE if you don't have one already.
I will not be repeating some of the common sections (such as Installation/Setup (Helm, Strimzi, Azure Kubernetes Service), Strimzi overview) in this or subsequent part of this series and would request you to refer to part one
Add persistence
We will start off by creating a persistent cluster. Here is a snippet of the specification (you can access the complete YAML on GitHub)
apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
name: my-kafka-cluster
spec:
kafka:
version: 2.4.0
replicas: 1
storage:
type: persistent-claim
size: 2Gi
deleteClaim: true
....
zookeeper:
replicas: 1
storage:
type: persistent-claim
size: 1Gi
deleteClaim: true
The key things to notice:
-
storage.type
ispersistent-claim
(as opposed toephemeral
) in previous examples -
storage.size
for Kafka and Zookeeper nodes is2Gi
and1Gi
respectively -
deleteClaim: true
means that the correspondingPersistentVolumeClaim
s will be deleted when the cluster is deleted/un-deployed
You can take a look at the reference for
storage
https://strimzi.io/docs/operators/master/using.html#type-PersistentClaimStorage-reference
To create the cluster:
kubectl apply -f https://raw.githubusercontent.com/abhirockzz/kafka-kubernetes-strimzi/master/part-4/kafka-persistent.yaml
Let's see the what happens in response to the cluster creation
Strimzi Kubernetes magic...
Strimzi does all the heavy lifting of creating required Kubernetes resources in order to operate the cluster. We covered most of these in part 1 - StatefulSet
(and Pods
), LoadBalancer
Service, ConfigMap
, Secret
etc. In this blog, we will just focus on the persistence related components - PersistentVolume
and PersistentVolumeClaim
If you're using Azure Kubernetes Service (AKS), this will create an Azure Managed Disk - more on this soon
To check the PersistentVolumeClaim
s
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-my-kafka-cluster-kafka-0 Bound pvc-b4ece32b-a46c-4fbc-9b58-9413eee9c779 2Gi RWO default 94s
data-my-kafka-cluster-zookeeper-0 Bound pvc-d705fea9-c443-461c-8d18-acf8e219eab0 1Gi RWO default 3m20s
... and the PersistentVolume
s they are Bound
to
kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-b4ece32b-a46c-4fbc-9b58-9413eee9c779 2Gi RWO Delete Bound default/data-my-kafka-cluster-kafka-0 default 107s
pvc-d705fea9-c443-461c-8d18-acf8e219eab0 1Gi RWO Delete Bound default/data-my-kafka-cluster-zookeeper-0 default 3m35s
Notice that the disk size is as specified in the manifest ie.
2
and1
Gib for Kafka and Zookeeper respectively
Where is the data?
If we want to see the data itself, let's first check the ConfigMap
which stores the Kafka server config:
export CLUSTER_NAME=my-kafka-cluster
kubectl get configmap/${CLUSTER_NAME}-kafka-config -o yaml
In server.config
section, you will find an entry as such:
##########
# Kafka message logs configuration
##########
log.dirs=/var/lib/kafka/data/kafka-log${STRIMZI_BROKER_ID}
This tells us that the Kafka data is stored in /var/lib/kafka/data/kafka-log${STRIMZI_BROKER_ID}
. In this case STRIMZI_BROKER_ID
is 0
since we all we have is a single node
With this info, let's look the the Kafka Pod
:
export CLUSTER_NAME=my-kafka-cluster
kubectl get pod/${CLUSTER_NAME}-kafka-0 -o yaml
If you look into the kafka
container
section, you will notice the following:
One of the volumes
configuration:
volumes:
- name: data
persistentVolumeClaim:
claimName: data-my-kafka-cluster-kafka-0
The volume
named data
is associated with the data-my-kafka-cluster-kafka-0
PVC, and the corresponding volumeMounts
uses this volume to ensure that Kafka data is stored in /var/lib/kafka/data
volumeMounts:
- mountPath: /var/lib/kafka/data
name: data
To see the contents,
export STRIMZI_BROKER_ID=0
kubectl exec -it my-kafka-cluster-kafka-0 -- ls -lrt /var/lib/kafka/data/kafka-log${STRIMZI_BROKER_ID}
You can repeat the same for Zookeeper node as well
...what about the Cloud?
As mentioned before, in case of AKS, the data will end up being stored in an Azure Managed Disk. The type of disk is as per the default
storage class in your AKS cluster. In my case, it is:
kubectl get sc
azurefile kubernetes.io/azure-file 58d
azurefile-premium kubernetes.io/azure-file 58d
default (default) kubernetes.io/azure-disk 2d18h
managed-premium kubernetes.io/azure-disk 2d18h
//to get details of the storage class
kubectl get sc/default -o yaml
More on the semantics for
default
storage class inAKS
in the documentation
To query the disk in Azure, extract the PersistentVolume
info using kubectl get pv/<name of kafka pv> -o yaml
and get the ID of the Azure Disk i.e. spec.azureDisk.diskURI
You can use the Azure CLI command az disk show
command
az disk show --ids <diskURI value>
You will see that the storage type as defined in sku
section is StandardSSD_LRS
which corresponds to a Standard SSD
This table provides a comparison of different Azure Disk types
"sku": {
"name": "StandardSSD_LRS",
"tier": "Standard"
}
... and the tags
attribute highlight the PV
and PVC
association
"tags": {
"created-by": "kubernetes-azure-dd",
"kubernetes.io-created-for-pv-name": "pvc-b4ece32b-a46c-4fbc-9b58-9413eee9c779",
"kubernetes.io-created-for-pvc-name": "data-my-kafka-cluster-kafka-0",
"kubernetes.io-created-for-pvc-namespace": "default"
}
You can repeat the same for Zookeeper disks as well
Quick test ...
Follow these steps to confirm that the cluster is working as expected..
Create a producer Pod
:
export KAFKA_CLUSTER_NAME=my-kafka-cluster
kubectl run kafka-producer -ti --image=strimzi/kafka:latest-kafka-2.4.0 --rm=true --restart=Never -- bin/kafka-console-producer.sh --broker-list $KAFKA_CLUSTER_NAME-kafka-bootstrap:9092 --topic my-topic
In another terminal, create a consumer Pod
:
export KAFKA_CLUSTER_NAME=my-kafka-cluster
kubectl run kafka-consumer -ti --image=strimzi/kafka:latest-kafka-2.4.0 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server $KAFKA_CLUSTER_NAME-kafka-bootstrap:9092 --topic my-topic --from-beginning
What if(s) ...
Let's explore how to tackle a couple of requirements which you'll come across:
- Using a different storage type - In case of Azure for example, you might want to use Azure Premium SSD for production workloads
- Re-sizing the storage - at some point you'll want to add storage to your Kafka cluster
Change the storage type
Recall that the default behavior is for Strimzi to create a PersistentVolumeClaim
that references the default
Storage Class. To customize this, you can simply include the class
attribute in the storage
specification in spec.kafka
(and/or spec.zookeeper
).
In Azure, the managed-premium storage
class corresponds to a Premium SSD: kubectl get sc/managed-premium -o yaml
Here is a snippet from the storage config, where class: managed-premium
has been added.
storage:
type: persistent-claim
size: 2Gi
deleteClaim: true
class: managed-premium
Please note that you cannot update the storage type for an existing cluster. To try this out:
- Delete the existing cluster -
kubectl delete kafka/my-kafka-cluster
(wait for a while) - Create a new cluster -
kubectl apply -f https://raw.githubusercontent.com/abhirockzz/kafka-kubernetes-strimzi/master/part-4/kafka-persistent-premium.yaml
//Delete the existing cluster
kubectl delete kafka/my-kafka-cluster
//Create a new cluster
kubectl apply -f https://raw.githubusercontent.com/abhirockzz/kafka-kubernetes-strimzi/master/part-4/kafka-persistent-premium.yaml
To confirm, check the PersistentVolumeClain
for Kafka node - notice the STORAGECLASS
colum
kubectl get pvc/data-my-kafka-cluster-kafka-0
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-my-kafka-cluster-kafka-0 Bound pvc-3f46d6ed-9da5-4c49-87ef-86684ab21cf8 2Gi RWO managed-premium 21s
We only configured the Kafka broker to use the Premium storage, so the Zookeeper
Pod
will use theStandardSSD
storage type.
Re-size storage (TL;DR - does not work yet)
Azure Disks allow you to add more storage to it. In the case of Kubernetes, it is the storage class which defines whether this is supported or not - for AKS, if you check the default (or the managed-premium
) storage class, you will notice the property allowVolumeExpansion: true
, which confirms that you can do so in the context of Kubernetes PVC as well.
Strimzi makes it really easy to increase the storage for our Kafka cluster - all you need to do is update the storage.size
field to the desired value
Check the PVC now: kubectl describe pvc data-my-kafka-cluster-kafka-0
Conditions:
Type Status LastProbeTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
Resizing True Mon, 01 Jan 0001 00:00:00 +0000 Mon, 22 Jun 2020 23:15:26 +0530
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning VolumeResizeFailed 3s (x11 over 13s) volume_expand error expanding volume "default/data-my-kafka-cluster-kafka-0" of plugin "kubernetes.io/azure-disk": compute.DisksClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=<nil> Code="OperationNotAllowed" Message="Cannot resize disk kubernetes-dynamic-pvc-3f46d6ed-9da5-4c49-87ef-86684ab21cf8 while it is attached to running VM /subscriptions/9a42a42f-ae42-4242-b6a7-dda0ea91d342/resourceGroups/mc_my-k8s-vk_southeastasia/providers/Microsoft.Compute/virtualMachines/aks-agentpool-42424242-1. Resizing a disk of an Azure Virtual Machine requires the virtual machine to be deallocated. Please stop your VM and retry the operation."
Notice the "Cannot resize disk...
error message. This is happening because the Azure Disk is currently attached with AKS cluster node and that is because of the Pod
is associated with the PersistentVolumeClaim
- this is a documented limitation
I am not the first one to run into this problem of course. Please refer to issues such as this one for details.
There are workarounds but they have not been discussed in this blog. I included the section since I wanted you to be aware of this caveat
Final countdown ...
We want to leave on a high note, don't we? Alright, so to wrap it up, let's scale our cluster out from one to three nodes. It'd dead simple!
All you need to do is to increase the replicas to the desired number - in this case, I configured it to 3 (for Kafka and Zookeeper)
...
spec:
kafka:
version: 2.4.0
replicas: 3
zookeeper:
replicas: 3
...
In addition to this, I also added an external load balancer listener (this will create an Azure Load Balancer, as discussed in part 2)
...
listeners:
plain: {}
external:
type: loadbalancer
...
To create the new, simply use the new manifest
kubectl apply -f https://raw.githubusercontent.com/abhirockzz/kafka-kubernetes-strimzi/master/part-4/kafka-persistent-multi-node.yaml
Please note that the overall cluster readiness will take time since there will be additional components (Azure Disks, Load Balancer public IPs etc.) that'll be created prior to the Pods being activated
In your k8s cluster, you will see...
Three Pods each for Kafka and Zookeeper
kubectl get pod -l=app.kubernetes.io/instance=my-kafka-cluster
NAME READY STATUS RESTARTS AGE
my-kafka-cluster-kafka-0 2/2 Running 0 54s
my-kafka-cluster-kafka-1 2/2 Running 0 54s
my-kafka-cluster-kafka-2 2/2 Running 0 54s
my-kafka-cluster-zookeeper-0 1/1 Running 0 4m44s
my-kafka-cluster-zookeeper-1 1/1 Running 0 4m44s
my-kafka-cluster-zookeeper-2 1/1 Running 0 4m44s
Three pairs (each for Kafka and Zookeeper) of PersistentVolumeClaims ...
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-my-kafka-cluster-kafka-0 Bound pvc-0f52dee1-970a-4c55-92bd-a97dcc41aee6 3Gi RWO managed-premium 10m
data-my-kafka-cluster-kafka-1 Bound pvc-f8b613cb-3da0-4932-acea-7e5e96df1433 3Gi RWO managed-premium 4m24s
data-my-kafka-cluster-kafka-2 Bound pvc-fedf431c-d87a-4bf7-80d0-d43b1337c079 3Gi RWO managed-premium 4m24s
data-my-kafka-cluster-zookeeper-0 Bound pvc-1fda3714-3c37-428f-9e4b-bdb5da71cda6 1Gi RWO default 12m
data-my-kafka-cluster-zookeeper-1 Bound pvc-702556e0-890a-4c07-ae5c-e2354d74d006 1Gi RWO default 6m42s
data-my-kafka-cluster-zookeeper-2 Bound pvc-176ffd68-7e3a-4e04-abb1-52c54dcb84f0 1Gi RWO default 6m42s
... and the respective PersistentVolumes they are bound to
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-0f52dee1-970a-4c55-92bd-a97dcc41aee6 3Gi RWO Delete Bound default/data-my-kafka-cluster-kafka-0 managed-premium 12m
pvc-176ffd68-7e3a-4e04-abb1-52c54dcb84f0 1Gi RWO Delete Bound default/data-my-kafka-cluster-zookeeper-2 default 8m45s
pvc-1fda3714-3c37-428f-9e4b-bdb5da71cda6 1Gi RWO Delete Bound default/data-my-kafka-cluster-zookeeper-0 default 14m
pvc-702556e0-890a-4c07-ae5c-e2354d74d006 1Gi RWO Delete Bound default/data-my-kafka-cluster-zookeeper-1 default 8m45s
pvc-f8b613cb-3da0-4932-acea-7e5e96df1433 3Gi RWO Delete Bound default/data-my-kafka-cluster-kafka-1 managed-premium 6m27s
pvc-fedf431c-d87a-4bf7-80d0-d43b1337c079 3Gi RWO Delete Bound default/data-my-kafka-cluster-kafka-2 managed-premium 6m22s
... and Load Balancer IPs. Notice that these are created for each Kafka broker as well as a bootstrap IP which is recommended when connecting from client applications.
kubectl get svc -l=app.kubernetes.io/instance=my-kafka-cluster
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
my-kafka-cluster-kafka-0 LoadBalancer 10.0.11.154 40.119.248.164 9094:30977/TCP 10m
my-kafka-cluster-kafka-1 LoadBalancer 10.0.146.181 20.43.191.219 9094:30308/TCP 10m
my-kafka-cluster-kafka-2 LoadBalancer 10.0.223.202 40.119.249.20 9094:30313/TCP 10m
my-kafka-cluster-kafka-bootstrap ClusterIP 10.0.208.187 <none> 9091/TCP,9092/TCP 16m
my-kafka-cluster-kafka-brokers ClusterIP None <none> 9091/TCP,9092/TCP 16m
my-kafka-cluster-kafka-external-bootstrap LoadBalancer 10.0.77.213 20.43.191.238 9094:31051/TCP 10m
my-kafka-cluster-zookeeper-client ClusterIP 10.0.3.155 <none> 2181/TCP 18m
my-kafka-cluster-zookeeper-nodes ClusterIP None <none> 2181/TCP,2888/TCP,3888/TCP 18m
To access the cluster, you can use the steps outlined in part 2
It's a wrap!
That's it for this blog series on which covered some of the aspects of running Kafka on Kubernetes using the open source Strimzi operator.
If this topic is of interest to you, I encourage you to check out other solutions such as Confluent operator and Banzai Cloud Kafka operator
Posted on July 28, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.