What is Kafka? Event Streaming in a Nutshell

schultyy

Jan Schulte

Posted on February 3, 2023

What is Kafka? Event Streaming in a Nutshell

Keeping up with user demand, growing amounts of data to process, these
are common use cases developers need to solve these days. At a certain
point, handling all workflows within a single application code base
becomes challenging. When you reach these boundaries, decoupling is key.
How do you decouple processing large amounts of data? That's where
Apache Kafka can make a great difference.

You Can Scale

If you have heard about Kafka before but never had the time to learn
about it, this guide is for you.

What is Kafka?

At a glance, Apache Kafka is an open source message broker, enabling
developers to write more scalable applications. It was initially
developed by LinkedIn, and became an open source project in 2011. Today,
its main use cases focus on low latency event processing, solving a
variety of use cases.

Is Kafka a Queue?

Yes, and No. Kafka can be used as a message queue. Mainly, however,
Kafka is a distributed messaging system or event broker. Under the hood,
it includes components of both a message queue and a publish-subscribe
model. At a glance, it provides fault tolerance and high throughput
stream
processing.

Kafka usually does not get deployed as a single instance but rather as a
cluster, consisting of at least three instances. These instances share
the incoming data among each other and replicate it, so that if one
instance, or also called broker, goes offline, producers and consumers
can still send and process events.

Message Queues v Apache Kafka

At first glance, Kafka might look like another message queue. To some
extend, this assessment has value but leaves important bits out. You
could see Kafka as a Message Queue - with a twist.

The main use case for a message queue is to decouple workflows. For
instance, a user wants to send an email to another recipient. Using a
message queue, the code handling the "send email" event might publish a
message to a queue. A worker consuming messages from the queue consumes
the message and sends the email. We decouple a workflow like this with a
message queue, since sending the email could take a long time and we
don't want to leave the user waiting. Also, when the user clicks the
"Send" button, it is of no consequence if the email gets sent in that
exact moment or twenty seconds later. The user clicks "Send" and can
focus on writing the next email. The number of messages producers can
publish to a queue is not limited. If a high volume of messages is
expected, we add more consumers to the queue. Once a consumer has
finished a job, i.e. sending an email, it marks the message as completed
and it gets removed from the queue. In this case, a message can only
trigger one kind of action directly on the consumer side.

Message Queue

With Kafka, on the other hand, things look a bit different. At a high
level, Kafka looks similar to a standard message queue. Looking at it
more closely, however, we encounter major differences. Similar to
message queues, Kafka also has producers and consumers. Instead of
releasing messages into a queue, Kafka producers produce so-called
events and publish them to a topic (more on that later). A topic is
persistent; events do not get deleted after they got processed. You can
configure Kafka to remove old events after a certain time, but by
default events do not change. Additionally, each event has an
offset. The consumer uses the offset to remember what events it
already has processed. This comes in handy if the consumer crashed for
some reason and wants to pick up work again. With the offset, it can
resume where it had originally left off.

Event Broker

How Message Queues and Event Streaming Influence Programming Patterns

Earlier, we used sending an email as an example use case for a message
queue. When we look at it a bit more closely, such a workflow follows an
Imperative Programming Pattern. The user clicks a "Send" button in
the user interface. The code handling this particular event publishes a
message on the queue. On the receiving end, a consumer takes the message
and sends the email to the recipient.

While we have decoupled the click event from sending the email, overall,
event is still tightly coupled with its actions.

Event Brokers, on the other hand, encourage a more Reactive
Programming Pattern
. An event depicts something that has happened in
the system
, such as payment_processed in an e-commerce scenario. For
instance, a user has just checked out and the paymentservice
successfully processed the payment. This kicks off the next steps in the
workflow. The paymentservice publishes the event to the topic and
continues with the next transaction. On the receiving end, the situation
looks different compared to a message queue. Instead of a single
consumer, in this scenario, we have two consumers: shippingservice and
notificationservice. Both wait for payment_processed events, but
perform completely different actions. Code is loosely coupled, and an
event can lead to effects in different domains (like shipping and
notifications).

What is a Topic?

Kafka organizes different types of events in topics. It is a named
logical channel between producers and consumers. In the case of an
online store, a topic might be payment_events, containing a trail of
(un-)successfully processed payments. On a more technical level, a topic
can be considered an append-only log, where one or more producers
publish events to. On the receiving end, we have consumers, reading from
it and processing events further. Once a consumer has processed an
event, it moves on to the next event, without deleting anything.

Topics and Partions

Topics have a limitation: Only one consumer at a time can read from it.
If paymentservice processes a lot of payments, and therefore releasing
many events, it would eventually overwhelm a single consumer. So, how is
it possible to still keep up with the workload?

Kafka allows to more consumers to process payment_processed events,
congregating them in a so-called consumer group. All consumers in a
consumer group read from a specific topic, but an event can only be
processed by a single consumer within that group. How do we make sure
all consumers have enough work? Kafka allows us to split up a topic into
so-called partitions. Whenever a producer publishes an event to a
topic, it gets stored in a partition. Each topic has at least one
partition. A consumer reads from a specific partition.

Topic Partitioning

Another advantage with splitting up a topic into partitions: We can
store individual partitions on different Kafka instances within the
cluster. This distributes the workload, but also increases fault
tolerance. Even though a single Kafka broker might be responsible for a
specific partition, it still gets replicated across the cluster. If the
broker goes offline, consumers can still read from replicas. So, a
partition is a scaling mechanism as well as it increases fault
tolerance
.

With this in mind, how can we implement a workflow where
shippingservice and notificationservice read from the
payment_events topic? While Kafka only allows a single consumer for a
partition within a single consumer group, we can add a seperate consumer
group to read from the same topic and partitions.

Why should I consider Kafka?

There are many reasons why you should consider implementing Kafka. For
instance, Kafka can make a big difference in Monolith to Microservice
migrations. Every direct call to a different service within a monolith
could be a candidate for microservices communicating via Kafka.

Or, if you already following a microservices-based approach in your
architecture, these services might be communicating with each other via
direct HTTP calls. While microservices have many benefits, direct (and
maybe even hard-coded) HTTP calls pose a potential bottleneck. What if,
instead of directly calling Service B from Service A, both communicate
via a Kafka topic instead?

What are your use cases? Share them in the comments below.

💖 💪 🙅 🚩
schultyy
Jan Schulte

Posted on February 3, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related