Kafka on Kubernetes, the Strimzi way! (Part 1)
Abhishek Gupta
Posted on June 8, 2020
Some of my previous blog posts (such as Kafka Connect on Kubernetes, the easy way!), demonstrate how to use Kafka Connect in a Kubernetes-native way. This is the first in a series of blog posts which will cover Apache Kafka on Kubernetes using the Strimzi Operator. In this post, we will start off with the simplest possible setup i.e. a single node Kafka (and Zookeeper) cluster and learn:
- Strimzi overview and setup
- Kafka cluster installation
- Kubernetes resources used/created behind the scenes
- Test the Kafka setup using clients within the Kubernetes cluster
The code is available on GitHub - https://github.com/abhirockzz/kafka-kubernetes-strimzi
What do I need to try this out?
kubectl
- https://kubernetes.io/docs/tasks/tools/install-kubectl/
I will be using Azure Kubernetes Service (AKS) to demonstrate the concepts, but by and large it is independent of the Kubernetes provider (e.g. feel free to use a local setup such as minikube
). If you want to use AKS
, all you need is a Microsoft Azure account which you can get for FREE if you don't have one already.
Install Helm
I will be using Helm
to install Strimzi
. Here is the documentation to install Helm
itself - https://helm.sh/docs/intro/install/
You can also use the
YAML
files directly to installStrimzi
. Check out the quick start guide here - https://strimzi.io/docs/quickstart/latest/#proc-install-product-str
(optional) Setup Azure Kubernetes Service
Azure Kubernetes Service (AKS) makes it simple to deploy a managed Kubernetes cluster in Azure. It reduces the complexity and operational overhead of managing Kubernetes by offloading much of that responsibility to Azure. Here are examples of how you can setup an AKS cluster using
Once you setup the cluster, you can easily configure kubectl
to point to it
az aks get-credentials --resource-group <CLUSTER_RESOURCE_GROUP> --name <CLUSTER_NAME>
Wait, what is Strimzi
?
from the Strimzi documentation
Strimzi
simplifies the process of running Apache Kafka in a Kubernetes cluster. Strimzi provides container images and Operators for running Kafka on Kubernetes. It is a part of the Cloud Native Computing Foundation
as a Sandbox
project (at the time of writing)
Strimzi Operators
are fundamental to the project. These Operators are purpose-built with specialist operational knowledge to effectively manage Kafka. Operators simplify the process of: Deploying and running Kafka clusters and components, Configuring and securing access to Kafka, Upgrading and managing Kafka and even taking care of managing topics and users.
Here is a diagram which shows a 10,000 feet overview of the Operator roles:
Install Strimzi
Installing Strimzi
using Helm
is pretty easy:
//add helm chart repo for Strimzi
helm repo add strimzi https://strimzi.io/charts/
//install it! (I have used strimzi-kafka as the release name)
helm install strimzi-kafka strimzi/strimzi-kafka-operator
This will install the Strimzi
Operator (which is nothing but a Deployment
), Custom Resource Definitions and other Kubernetes components such as Cluster Roles
, Cluster Role Bindings
and Service Accounts
For more details, check out this link
To delete, simply
helm uninstall strimzi-kafka
To confirm that the Strimzi Operator had been deployed, check it's Pod
(it should transition to Running
status after a while)
kubectl get pods -l=name=strimzi-cluster-operator
NAME READY STATUS RESTARTS AGE
strimzi-cluster-operator-5c66f679d5-69rgk 1/1 Running 0 43s
Check the Custom Resource Definitions as well:
kubectl get crd | grep strimzi
kafkabridges.kafka.strimzi.io 2020-04-13T16:49:36Z
kafkaconnectors.kafka.strimzi.io 2020-04-13T16:49:33Z
kafkaconnects.kafka.strimzi.io 2020-04-13T16:49:36Z
kafkaconnects2is.kafka.strimzi.io 2020-04-13T16:49:38Z
kafkamirrormaker2s.kafka.strimzi.io 2020-04-13T16:49:37Z
kafkamirrormakers.kafka.strimzi.io 2020-04-13T16:49:39Z
kafkas.kafka.strimzi.io 2020-04-13T16:49:40Z
kafkatopics.kafka.strimzi.io 2020-04-13T16:49:34Z
kafkausers.kafka.strimzi.io 2020-04-13T16:49:33Z
kafkas.kafka.strimzi.io
CRD represents Kafka clusters in Kubernetes
Now that we have the "brain" (the Strimzi Operator) wired up, let's use it!
Time to create a Kafka cluster!
As mentioned, we will keep things simple and start off with the following setup (which we will incrementally update as a part of subsequent posts in this series):
- A single node Kafka cluster (and Zookeeper)
- Available internally to clients in the same Kubernetes cluster
- No encryption, authentication or authorization
- No persistence (uses
emptyDir
volume)
To deploy a Kafka cluster all we need to do is create a Strimzi Kafka
resource. This is what it looks like:
apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
name: my-kafka-cluster
spec:
kafka:
version: 2.4.0
replicas: 1
listeners:
plain: {}
config:
offsets.topic.replication.factor: 1
transaction.state.log.replication.factor: 1
transaction.state.log.min.isr: 1
log.message.format.version: "2.4"
storage:
type: ephemeral
zookeeper:
replicas: 1
storage:
type: ephemeral
For a detailed
Kafka
CRD reference, please check out the documentation - https://strimzi.io/docs/operators/master/using.html#type-Kafka-reference
We define the name (my-kafka-cluster
) of cluster in metadata.name
. Here is a summary of attributes in spec.kafka
:
-
version
- The Kafka broker version (defaults to2.5.0
at the time of writing, but we're using2.4.0
) -
replicas
- Kafka cluster size i.e. the number of Kafka nodes (Pod
s in the cluster) -
listeners
- Configures listeners of Kafka brokers. In this example we are using theplain
listener which means that the cluster will be accessible to internal clients (in the same Kubernetes cluster) on port9092
(no encryption, authentication or authorization involved). Supported types areplain
,tls
,external
(See https://strimzi.io/docs/operators/master/using.html#type-KafkaListeners-reference). It is possible to configure multiple listeners (we will cover this in subsequent blogs posts) -
config
- These are key-value pairs used as Kafka broker config properties -
storage
- Storage for Kafka cluster. Supported types areephemeral
,persistent-claim
andjbod
. We are usingephemeral
in this example which means that theemptyDir
volume is used and the data is only associated with the lifetime of the Kafka brokerPod
(a future blog post will coverpersistent-claim
storage)
Zookeeper cluster details (spec.zookeeper
) are similar to that of Kafka. In this case we just configuring the no. of replicas
and storage
type. Refer to https://strimzi.io/docs/operators/master/using.html#type-ZookeeperClusterSpec-reference for details
To create the Kafka cluster:
kubectl apply -f https://raw.githubusercontent.com/abhirockzz/kafka-kubernetes-strimzi/master/part-1/kafka.yaml
What's next?
The Strimzi operator spins into action and creates many Kubernetes resources in response to the Kafka
CRD instance we just created.
The following resources are created:
-
StatefulSet
- Kafka and Zookeeper clusters are exist in the form ofStatefulSet
s which is used to manage stateful workloads in Kubernetes. Please refer https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/ and related material for details -
Service
- KubernetesClusterIP
Service for internal access -
ConfigMap
- Kafka and Zookeeper configuration is stored in KubernetesConfigMap
s -
Secret
- KubernetesSecret
s to store private keys and certificates for Kafka cluster components and clients. These are used for TLS encryption and authentication (covered in subsequent blog posts)
Kafka Custom Resource
kubectl get kafka
NAME DESIRED KAFKA REPLICAS DESIRED ZK REPLICAS
my-kafka-cluster 1 1
StatefulSet
and Pod
Check Kafka and Zookeeper StatefulSet
s using:
kubectl get statefulset/my-kafka-cluster-zookeeper
kubectl get statefulset/my-kafka-cluster-kafka
Kafka and Zookeeper Pod
s
kubectl get pod/my-kafka-cluster-zookeeper-0
kubectl get pod/my-kafka-cluster-kafka-0
ConfigMap
Individual ConfigMap
s are created to store Kafka
and Zookeeper
configurations
kubectl get configmap
my-kafka-cluster-kafka-config 4 19m
my-kafka-cluster-zookeeper-config 2 20m
Let's peek into the Kafka configuration
kubectl get configmap/my-kafka-cluster-kafka-config -o yaml
The output is quite lengthy but I will highlight the important bits. As part of the data section, there are two config properties for the Kafka broker - log4j.properties
and server.config
.
Here is a snippet of the server.config
. Notice the advertised.listeners
(highlights the internal access over port 9092
) and User provided configuration
(the one we specified in the yaml
manifest)
##############################
##############################
# This file is automatically generated by the Strimzi Cluster Operator
# Any changes to this file will be ignored and overwritten!
##############################
##############################
broker.id=${STRIMZI_BROKER_ID}
log.dirs=/var/lib/kafka/data/kafka-log${STRIMZI_BROKER_ID}
##########
# Plain listener
##########
##########
# Common listener configuration
##########
listeners=REPLICATION-9091://0.0.0.0:9091,PLAIN-9092://0.0.0.0:9092
advertised.listeners=REPLICATION-9091://my-kafka-cluster-kafka-${STRIMZI_BROKER_ID}.my-kafka-cluster-kafka-brokers.default.svc:9091,PLAIN-9092://my-kafka-cluster-kafka-${STRIMZI_BROKER_ID}.my-kafka-cluster-kafka-brokers.default.svc:9092
listener.security.protocol.map=REPLICATION-9091:SSL,PLAIN-9092:PLAINTEXT
inter.broker.listener.name=REPLICATION-9091
sasl.enabled.mechanisms=
ssl.secure.random.implementation=SHA1PRNG
ssl.endpoint.identification.algorithm=HTTPS
##########
# User provided configuration
##########
log.message.format.version=2.4
offsets.topic.replication.factor=1
transaction.state.log.min.isr=1
transaction.state.log.replication.factor=1
Service
If you query for Service
s, you should see something similar to this:
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
my-kafka-cluster-kafka-bootstrap ClusterIP 10.0.240.137 <none> 9091/TCP,9092/TCP
my-kafka-cluster-kafka-brokers ClusterIP None <none> 9091/TCP,9092/TCP
my-kafka-cluster-zookeeper-client ClusterIP 10.0.143.149 <none> 2181/TCP
my-kafka-cluster-zookeeper-nodes ClusterIP None <none> 2181/TCP,2888/TCP,3888/TCP
my-kafka-cluster-kafka-bootstrap
makes it possible for internal Kubernetes clients to access the Kafka cluster and my-kafka-cluster-kafka-brokers
is the Headless
service corresponding to the StatefulSet
Secret
Although we're not using them, it's helpful to look at the Secret
s created by Strimzi
:
kubectl get secret
my-kafka-cluster-clients-ca Opaque
my-kafka-cluster-clients-ca-cert Opaque
my-kafka-cluster-cluster-ca Opaque
my-kafka-cluster-cluster-ca-cert Opaque
my-kafka-cluster-cluster-operator-certs Opaque
my-kafka-cluster-kafka-brokers Opaque
my-kafka-cluster-kafka-token-vb2qt kubernetes.io/service-account-token
my-kafka-cluster-zookeeper-nodes Opaque
my-kafka-cluster-zookeeper-token-xq8m2 kubernetes.io/service-account-token
-
my-kafka-cluster-cluster-ca-cert
- Cluster CA certificate to sign Kafka broker certificates, and is used by a connecting client to establish a TLS encrypted connection -
my-kafka-cluster-clients-ca-cert
- Client CA certificate for a user to sign its own client certificate to allow mutual authentication against the Kafka cluster
Ok, but does it work?
Let's take it for a spin!
Create a producer Pod
:
export KAFKA_CLUSTER_NAME=my-kafka-cluster
kubectl run kafka-producer -ti --image=strimzi/kafka:latest-kafka-2.4.0 --rm=true --restart=Never -- bin/kafka-console-producer.sh --broker-list $KAFKA_CLUSTER_NAME-kafka-bootstrap:9092 --topic my-topic
In another terminal, create a consumer Pod
:
export KAFKA_CLUSTER_NAME=my-kafka-cluster
kubectl run kafka-consumer -ti --image=strimzi/kafka:latest-kafka-2.4.0 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server $KAFKA_CLUSTER_NAME-kafka-bootstrap:9092 --topic my-topic --from-beginning
The above demonstration was taken from the Strimzi doc - https://strimzi.io/docs/operators/master/deploying.html#deploying-example-clients-str
You can use other clients as well
We're just getting started...
We started small, but have a Kafka cluster on Kubernetes, and it works (hopefully for you as well!). As I mentioned before, this is the beginning of a multi-part blog series. Stay tuned for upcoming posts where we will explore other aspects such as external client access, TLS access, authentication, persistence etc.
Posted on June 8, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.