How to auto-scale Kafka applications on Kubernetes with KEDA
Abhishek Gupta
Posted on May 28, 2020
This tutorial will demonstrate auto-scaling Kafka based consumer applications on Kubernetes using KEDA
which stands for Kubernetes-based Event Driven Autoscaler
KEDA
can drive the scaling of any container in Kubernetes based on the number of events needing to be processed. It is a single-purpose and lightweight component that can be added to any Kubernetes cluster. KEDA works alongside standard Kubernetes components like the Horizontal Pod Autoscaler and can extend functionality without overwriting or duplication.
It has a built-in Kafka scaler which can auto-scale your Kafka consumer applications (traditional Consumer apps, Kafka Streams etc.) based on the consumer offset lag. I will be using Azure Event Hubs as the Kafka broker (although the concepts apply to any Kafka cluster) and Azure Kubernetes Service for the Kubernetes cluster (feel free to use alternatives such as minikube
)
Code is available on
GitHub
We will go through the following:
- A quick overview
- The app and KEDA configuration (mostly YAMLs! to be honest)
- How to setup KEDA and required Azure services
- Deploy the solution and watch auto-scaling in action
Overview
Here are the key components:
-
Producer app: This is a simple Go app that produces simulated JSON data to Kafka. It uses the
sarama
library. You can run this as a Docker container or directly as a Go app (details in an upcoming section) -
Consumer app: This is another Go app that consumes data from Kafka. To add a bit of variety, it uses the Confluent Go Kafka client. You will be running this as a Kubernetes
Deployment
(details in an upcoming section) -
KEDA
ScaledObject
(which defines the auto-scaling criteria based on Kafka) and other supporting manifests
Pre-requisites
kubectl
- https://kubernetes.io/docs/tasks/tools/install-kubectl/
If you choose to use Azure Event Hubs, Azure Kubernetes Service (or both) you will need a Microsoft Azure account. Go ahead and sign up for a free one!
I will be using Helm
to install KEDA
. Here is the documentation to install Helm
- https://helm.sh/docs/intro/install/
For alternative ways (
Operator Hub
or YAML files) of installingKEDA
, take a look at the documentation
Here is how you can set up the required Azure services.
I recommend installing the below services as a part of a single Azure Resource Group which makes it easy to clean up these services
Azure Event Hubs
Azure Event Hubs is a data streaming platform and event ingestion service. It can receive and process millions of events per second. It also provides a Kafka endpoint that can be used by existing Kafka based applications as an alternative to running your own Kafka cluster. Event Hubs supports Apache Kafka protocol 1.0 and later, and works with existing Kafka client applications and other tools in the Kafka ecosystem including Kafka Connect
(demonstrated in this blog), MirrorMaker
etc.
To set up an Azure Event Hubs cluster, you can choose from a variety of options including the Azure portal, Azure CLI, Azure PowerShell or an ARM template. Once the setup is complete, you will need the connection string (that will be used in subsequent steps) for authenticating to Event Hubs - use this guide to finish this step.
Please ensure that you also create an Event Hub (Kafka topic) to/from which we can send/receive data
Azure Kubernetes Service
Azure Kubernetes Service (AKS) makes it simple to deploy a managed Kubernetes cluster in Azure. It reduces the complexity and operational overhead of managing Kubernetes by offloading much of that responsibility to Azure. Here are examples of how you can set up an AKS cluster using Azure CLI, Azure portal or ARM template
Install KEDA
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
kubectl create namespace keda
helm install keda kedacore/keda --namespace keda
This will install the KEDA Operator and the KEDA Metrics API server (as separate Deployment
s)
kubectl get deployment -n keda
NAME READY UP-TO-DATE AVAILABLE AGE
keda-operator 1/1 1 1 1h
keda-operator-metrics-apiserver 1/1 1 1 1h
To check KEDA Operator logs
kubectl logs -f $(kubectl get pod -l=app=keda-operator -o jsonpath='{.items[0].metadata.name}' -n keda) -n keda
YAML time!
Let's enjoy introspecting some YAMLs!
Kubernetes Secret
We store Event Hubs connectivity details using a Secret
. More on this soon...
apiVersion: v1
kind: Secret
metadata:
name: eventhub-kafka-credentials
data:
authMode: <BASE64_AUTH_MODE>
username: <BASE64_EVENTHUBS_USERNAME>
password: <BASE64_EVENTHUBS_CONNECTION_STRING>
ScaledObject
A ScaledObject
for the KEDA Kafka scaler
Note the following references:
-
spec.scaleTargetRef.deploymentName
points to the name of theDeployment
that needs to be scaled - the
spec.triggers.authenticationRef
points to aTriggerAuthentication
apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
name: kafka-scaledobject
namespace: default
labels:
deploymentName: kafka-consumer
spec:
scaleTargetRef:
deploymentName: kafka-consumer
pollingInterval: 15
cooldownPeriod: 100
maxReplicaCount: 10
triggers:
- type: kafka
metadata:
bootstrapServers: <EVENTHUBS_NAMESPACE>.servicebus.windows.net:9093
consumerGroup: <EVENTHUB_CONSUMER_GROUP>
topic: <EVENTHUB_TOPIC_NAME>
lagThreshold: "10"
authenticationRef:
name: eventhub-kafka-triggerauth
Consumer Deployment
The Kafka consumer app runs as a Kubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: kafka-consumer
spec:
replicas: 1
selector:
matchLabels:
app: kafka-consumer
template:
metadata:
labels:
app: kafka-consumer
spec:
containers:
- name: kafka-consumer
image: abhirockzz/kafka-consumer
imagePullPolicy: IfNotPresent
env:
- name: KAFKA_EVENTHUB_PASSWORD
valueFrom:
secretKeyRef:
name: eventhub-kafka-credentials
key: password
- name: KAFKA_EVENTHUB_ENDPOINT
value: <EVENTHUBS_NAMESPACE>.windows.net:9093
- name: KAFKA_EVENTHUB_CONSUMER_GROUP
value: $Default
- name: KAFKA_EVENTHUB_TOPIC
value: <EVENTHUB_TOPIC_NAME>
Let's look at what the TriggerAuthentication
looks like
TriggerAuthentication
It refers to the Secret
which was described earlier (named eventhub-kafka-credentials
)
apiVersion: keda.k8s.io/v1alpha1
kind: TriggerAuthentication
metadata:
name: eventhub-kafka-triggerauth
spec:
secretTargetRef:
- parameter: authMode
name: eventhub-kafka-credentials
key: authMode
- parameter: username
name: eventhub-kafka-credentials
key: username
- parameter: password
name: eventhub-kafka-credentials
key: password
Deploy
It's time to see things in action
Start by cloning the Git
repo:
git clone https://github.com/abhirockzz/keda-eventhubs-kafka
We will deploy the Secret
that contains connection string for your Event Hubs instance. Update the file deploy/1-secret.yaml
, where the password
attribute contains the base64 encoded value of the Event Hubs connection string e.g. echo -n '<eventhubs-connection-string>' | base64
No need to change the
username
andauthMode
. They are just base64 versions of$ConnectionString
andsasl_plain
respectively
Deploy the Secret
kubectl apply -f deploy/1-secret.yaml
Update the file deploy/3-consumer.yaml
. Check the following references which are a part of spec.containers.env
-
KAFKA_EVENTHUB_ENDPOINT
- name of the Event Hubs namespace -
KAFKA_EVENTHUB_TOPIC
- name of the Event Hubs topic
e.g.
env:
- name: KAFKA_EVENTHUB_PASSWORD
valueFrom:
secretKeyRef:
name: eventhub-kafka-credentials
key: password
- name: KAFKA_EVENTHUB_ENDPOINT
value: my-eventhub.windows.net:9093
- name: KAFKA_EVENTHUB_CONSUMER_GROUP
value: $Default
- name: KAFKA_EVENTHUB_TOPIC
value: test-topic
Create the consumer app Deployment
kubectl apply -f deploy/3-consumer.yaml
You should see the consumer app spin into action (Running
state) soon
kubectl get pods -w
You can check the consumer app logs using
kubectl logs -f $(kubectl get pod -l=app=kafka-consumer -o jsonpath='{.items[0].metadata.name}')
Create the TriggerAuthentication
followed by the ScaledObject
kubectl apply -f deploy/2-trigger-auth.yaml
kubectl apply -f deploy/4-kafka-scaledobject.yaml
Check KEDA Operator logs again - you should see that it has reacted to the fact that the
ScaledObject
was just created
kubectl logs -f $(kubectl get pod -l=app=keda-operator -o jsonpath='{.items[0].metadata.name}' -n keda) -n keda
Check the consumer Pod
- kubectl get pods
. You will see that there are no Pod
s now - this is because KEDA actually scaled down Deployment to zero instances. You can confirm this by checking the Deployment as well kubectl get deployment/kafka-consumer
If you want to change this behavior e.g. have at least one instance of your Deployment running, add
minReplicaCount
attribute to theScaledObject
(defaults to0
)
From a scaling perspective, KEDA takes care of:
- scaling your app from 0 to 1 instance depending on the metrics reported by the scaler (Kafka consumer lag in this case)
- scaling your app from to 1 instance to 0
Rest of the heavy lifting (auto-scaling) is done by the Horizontal Pod Autoscaler (HPA) (created by the controller loop in the KEDA Operator), which is the native resource built into Kubernetes
Test
The stage is set for you to test auto-scaling. In one terminal, keep a watch on consumer Deployment resource
kubectl get deployment -w
Export environment variables for the producer app - enter the values for KAFKA_EVENTHUB_ENDPOINT
, KAFKA_EVENTHUB_CONNECTION_STRING
and KAFKA_EVENTHUB_TOPIC
export KAFKA_EVENTHUB_ENDPOINT=<EVENTHUBS_NAMESPACE>.servicebus.windows.net:9093
export KAFKA_EVENTHUB_CONNECTION_STRING="Endpoint=sb://<EVENTHUBS_NAMESPACE>.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=<<EVENTHUBS_ACCESS_KEY>"
export KAFKA_EVENTHUB_TOPIC=<TOPIC_NAME>
export KAFKA_EVENTHUB_USERNAME="\$ConnectionString"
Build and run the producer application
you can also run the Go app directly if you have Go (
1.13
or above) installed -cd producer && go run main.go
docker build -t producer-app producer
docker run --rm -e KAFKA_EVENTHUB_ENDPOINT=$KAFKA_EVENTHUB_ENDPOINT -e KAFKA_EVENTHUB_CONNECTION_STRING=$KAFKA_EVENTHUB_CONNECTION_STRING -e KAFKA_EVENTHUB_TOPIC=$KAFKA_EVENTHUB_TOPIC -e KAFKA_EVENTHUB_USERNAME=$KAFKA_EVENTHUB_USERNAME producer-app
The will start producing messages and you should see logs similar to this:
Event Hubs broker [<EVENTHUBS_NAMESPACE>.servicebus.windows.net:9093]
Event Hubs topic keda-test
Waiting for ctrl+c
sent message {"time":"Mon Apr 27 09:21:38 2020"} to partition 1 offset 0
sent message {"time":"Mon Apr 27 09:21:39 2020"} to partition 4 offset 0
sent message {"time":"Mon Apr 27 09:21:40 2020"} to partition 2 offset 0
sent message {"time":"Mon Apr 27 09:21:40 2020"} to partition 2 offset 1
sent message {"time":"Mon Apr 27 09:21:41 2020"} to partition 2 offset 2
sent message {"time":"Mon Apr 27 09:21:41 2020"} to partition 1 offset 1
...
On the other terminal, you should see your application auto-scale as the number of Pods in the Deployment increase.
kafka-consumer 0/5 4 0 33s
kafka-consumer 0/5 4 0 33s
kafka-consumer 0/5 4 0 33s
kafka-consumer 0/5 5 0 33s
kafka-consumer 1/5 5 1 80s
kafka-consumer 2/5 5 2 90s
kafka-consumer 3/5 5 3 101s
kafka-consumer 4/5 5 4 101s
kafka-consumer 5/5 5 5 2m9s
Please note that the no. of
Pod
s will not increase beyond the no. of partitions for the Event Hubs topic.
The HPA will also reflect the same status
kubectl get hpa/keda-hpa-kafka-consumer
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
keda-hpa-kafka-consumer Deployment/kafka-consumer 10/10 (avg) 1 10 5 3m
You can stop the producer app now. As the messages get consumed (and offsets are committed), the consumer lag will keep decreasing. At that point, the no. of instances will gradually be scaled down to 0
.
Clean-up
To uninstall KEDA:
helm uninstall -n keda keda
kubectl delete -f https://raw.githubusercontent.com/kedacore/keda/master/deploy/crds/keda.k8s.io_scaledobjects_crd.yaml
kubectl delete -f https://raw.githubusercontent.com/kedacore/keda/master/deploy/crds/keda.k8s.io_triggerauthentications_crd.yaml
To delete the Azure services, just delete the resource group:
az group delete --name <AZURE_RESOURCE_GROUP> --yes --no-wait
That concludes this blog on auto-scaling a Kafka consumer app deployed to Kubernetes using KEDA! Please join and be a part of the growing KEDA community, check out https://keda.sh/community/ for details
Posted on May 28, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.