Truong Phung
Posted on November 1, 2024
1. Quick Setup
We can quickly start Kafka using Docker Compose Follow Quick Setup Guide, and to test Kafka setup with the CLI (Command Line Interface), follow these steps:
-
Enter the Kafka Container : Run this from a new Terminal Window being used as Terminal for
producer
docker exec -it dev-kafka bash
-
Produce Messages: Send a message to a topic (replace
my_topic
with your topic name):
kafka-console-producer.sh --broker-list localhost:9092 --topic my_topic
Type your message and press Enter to send it.
-
Consume Messages: Read messages from the topic (Execute this in another Terminal Window for
comsumer
after running same command in step 1):
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic my_topic --from-beginning
-
List Topics: Verify that your topics were created successfully:
kafka-topics.sh --bootstrap-server localhost:9092 --list
-
Describe Topic Details: Check topic configuration and partitioning:
kafka-topics.sh --bootstrap-server localhost:9092 --topic my_topic --describe
These commands allow you to interact with Kafka topics, produce and consume messages, and verify your setup.
2. Common Kafka Commands
-
Create a Topic with Replication and Partitions:
kafka-topics.sh --create \ --topic my_topic \ --partitions 3 \ --replication-factor 2 \ --bootstrap-server localhost:9092
-
List Topics:
kafka-topics.sh --list --bootstrap-server localhost:9092
-
Describe a Topic (view partition, replication, and ISR details):
kafka-topics.sh --describe \ --topic my_topic \ --bootstrap-server localhost:9092
-
Produce Messages to a Topic (send messages from CLI):
kafka-console-producer.sh --topic my_topic --bootstrap-server localhost:9092
Type messages in the terminal to send them to Kafka (e.g.,
{"message": "Hello, Kafka KRaft!"}
). -
Consume Messages from a Topic (reads messages with a specific consumer group, can run many consummer Terminals to test this):
kafka-console-consumer.sh --topic my_topic \ --from-beginning \ --group my_consumer_group \ --bootstrap-server localhost:9092
-
Check Consumer Group Offsets (view offsets and lag for each consumer group):
kafka-consumer-groups.sh --describe \ --group my_consumer_group \ --bootstrap-server localhost:9092
-
Reset Consumer Group Offsets (useful to replay or skip messages):
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \ --group my_consumer_group \ --topic my_topic \ --reset-offsets --to-earliest --execute
-
Increase Partitions for a Topic (add partitions for scalability):
kafka-topics.sh --alter \ --topic my_topic \ --partitions 5 \ --bootstrap-server localhost:9092
-
Delete a Topic (remove a topic and its data):
kafka-topics.sh --delete \ --topic my_topic \ --bootstrap-server localhost:9092
3. More Advanced Commands
-
List All Consumer Groups:
kafka-consumer-groups.sh --list --bootstrap-server localhost:9092
-
View All Configurations of a Topic:
kafka-configs.sh --bootstrap-server localhost:9092 \ --entity-type topics \ --entity-name my_topic \ --describe
-
Update Topic Configurations (e.g., changing the retention period):
kafka-configs.sh --bootstrap-server localhost:9092 \ --entity-type topics \ --entity-name my_topic \ --alter --add-config retention.ms=604800000
-
View Broker Configurations:
kafka-configs.sh --bootstrap-server localhost:9092 \ --entity-type brokers \ --entity-name 1 \ --describe
-
Update Broker Configurations (e.g., changing log retention size):
kafka-configs.sh --bootstrap-server localhost:9092 \ --entity-type brokers \ --entity-name 1 \ --alter --add-config log.retention.bytes=1073741824
-
Delete Consumer Group Offsets (for removing stale or inactive consumer group offsets):
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \ --delete --group my_consumer_group
-
View Cluster Metadata (get details about brokers, topics, and partitions):
kafka-metadata-shell.sh --bootstrap-server localhost:9092
-
Monitor Kafka Lag for a Specific Consumer Group (useful for tracking how behind consumers are):
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \ --describe --group my_consumer_group
-
Run a Log Compaction on a Topic (if compaction is enabled and required):
kafka-topics.sh --alter --topic my_topic \ --config cleanup.policy=compact \ --bootstrap-server localhost:9092
-
Change Default Partition Assignment Strategy (configure broker-level partition assignment strategy):
To change the assignment strategy (e.g., to
range
,roundrobin
, orsticky
), update the broker configuration file (server.properties
) with:
partition.assignment.strategy=org.apache.kafka.clients.consumer.StickyAssignor
4. Kafka Core Concepts
Apache Kafka is a distributed event streaming platform used for building real-time data pipelines and streaming applications. Here are some common key aspects of Kafka:
-
Core Concepts:
- Producer: Sends messages (records) to Kafka topics.
- Consumer: Reads messages from Kafka topics.
- Topic: A category or feed name to which records are sent; itβs partitioned for scalability.
- Partition: A single topic can be divided into multiple partitions, allowing parallel processing and distribution across Kafka brokers.
- Broker: A server that stores topic partitions and handles requests from clients (producers and consumers).
-
Message Durability:
- Kafka ensures data durability by replicating partitions across multiple brokers, reducing the risk of data loss in case of broker failure.
- Each message in Kafka is stored on disk, providing fault tolerance.
-
Scalability:
- Kafka's design allows horizontal scaling by adding more brokers to a cluster and increasing partitions within a topic, distributing the load across brokers.
-
High Throughput and Low Latency:
- Kafka is optimized for high throughput, handling millions of messages per second with low latency, making it suitable for real-time data processing.
-
Offset Management:
- Consumers track their position in a topic using offsets, which indicates the last message read. This allows consumers to resume from a specific point in case of a failure.
-
Data Retention:
- Kafka retains messages for a configurable period (e.g., days or weeks), even after they are consumed, which allows for replaying or reprocessing past messages.
-
Stream Processing:
- Kafka can be integrated with stream processing frameworks like Apache Flink, Kafka Streams, and Apache Spark to process real-time data streams directly from topics.
-
Use Cases:
- Kafka is widely used for logging, monitoring, real-time analytics, event sourcing, and building data pipelines between systems.
By offering these capabilities, Kafka provides a robust, scalable, and fault-tolerant solution for managing real-time data streams in distributed environments.
5.Topic replication
and consumer groups
:
1. Topic Replication:
- Replication ensures high availability and fault tolerance for Kafka topics. Each topic is divided into partitions, and these partitions are replicated across multiple brokers.
- Replication Factor: This defines the number of copies of a partition. For example, a replication factor of 3 means that there are three copies of each partition on different brokers.
- Leader and Followers: For each partition, one replica is designated as the leader, and the others are followers. Producers and consumers interact with the leader, while followers replicate the data from the leader. If a broker holding the leader replica fails, one of the follower replicas is automatically promoted to be the new leader, ensuring that the data remains accessible.
2. Consumer Group:
- A consumer group is a group of consumers that work together to read messages from a Kafka topic in parallel, allowing horizontal scaling.
- Each consumer in a group reads from a subset of partitions, ensuring that each partition is consumed by only one consumer at a time within the group.
- Load Balancing: Kafka distributes partitions among consumers in the group, balancing the workload. If a consumer fails or leaves the group, Kafka will rebalance the partitions across the remaining consumers.
- Offset Management: Each consumer group maintains its own offsets, allowing different groups to consume the same topic independently, each keeping track of where they left off.
Together, replication ensures data durability and availability, while consumer groups provide scalability and fault-tolerant message processing.
6. The way Kafka distributes messages
Kafka distributes messages between consumers in a consumer group based on partition assignment. Here's a brief explanation of how this works:
-
Partitions and Consumers:
- Each Kafka topic is divided into multiple partitions. When consumers are part of a consumer group, Kafka assigns each partition to a specific consumer within the group.
- Each partition is consumed by only one consumer in a given consumer group, ensuring that messages from a partition are processed sequentially by that consumer.
-
Load Balancing:
- Kafka uses a partition assignment strategy (like range, round-robin, or sticky) to distribute partitions evenly among the consumers in a group.
- If the number of consumers equals the number of partitions, each consumer will be assigned exactly one partition.
- If there are more partitions than consumers, some consumers will be responsible for multiple partitions.
- If there are more consumers than partitions, some consumers will remain idle since each partition can only be assigned to one consumer in the group.
-
Rebalancing:
- When a consumer joins or leaves a consumer group, or when partitions change (e.g., a new topic partition is added), Kafka triggers a rebalance.
- During rebalancing, partitions are redistributed among the consumers, ensuring that each partition remains assigned to only one consumer.
By distributing partitions this way, Kafka allows for parallel processing of messages within a consumer group, improving scalability and load distribution.
7. Minimize the risk of message loss
Example when building Golang services with the IBM Sarama library for Kafka, it's crucial to implement strategies that minimize the risk of message loss. Here are some key practices to follow:
-
Enable Acknowledgments (acks)
- Configure the producer's acknowledgment setting to ensure message durability. Setting
acks=all
ensures that all replicas acknowledge the message before it is considered sent, reducing the risk of loss in case of broker failure.
config.Producer.RequiredAcks = sarama.WaitForAll
- Configure the producer's acknowledgment setting to ensure message durability. Setting
-
Use Idempotent Producer
- Enable idempotence in the producer configuration to ensure that duplicate messages are not produced during retries. This provides exactly-once delivery semantics and helps avoid data loss.
config.Producer.Idempotent = true
-
Configure Message Retries
- Set up the producer's retry mechanism to resend messages in case of transient errors. Use a reasonable retry count and a backoff strategy to handle temporary failures without losing messages.
config.Producer.Retry.Max = 5 config.Producer.Retry.Backoff = 100 * time.Millisecond
-
Enable Message Durability (Replication)
- Ensure that the Kafka topic is configured with sufficient replication across brokers. A higher replication factor reduces the risk of data loss if a broker fails.
# Example for creating a topic with a replication factor of 3 kafka-topics.sh --create --topic my-topic --replication-factor 3 --partitions 3 --zookeeper localhost:2181
-
Use Reliable Consumer Offsets Management
- For consumers, commit offsets manually after successfully processing a message to avoid skipping unprocessed messages. Utilize Saramaβs
OffsetManager
for managing committed offsets.
consumer.MarkOffset(msg, "")
- For consumers, commit offsets manually after successfully processing a message to avoid skipping unprocessed messages. Utilize Saramaβs
-
Handle Consumer Failures Gracefully
- Implement error handling in the consumer logic to manage failures (e.g., retries) while processing messages. This prevents premature offset commits and message loss due to processing errors.
-
Monitor and Handle Kafka Cluster Failures
- Set up monitoring and alerting for Kafka broker failures or performance issues. Handle failovers gracefully to prevent message loss when brokers are slow or unresponsive.
By following these practices, you can significantly mitigate the risk of losing messages while using the Sarama library with Kafka in Golang.
If you found this helpful, let me know by leaving a π or a comment!, or if you think this post could help someone, feel free to share it! Thank you very much! π
Posted on November 1, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.