- Apache Kafka is an Open-Source framework for distributed data streaming.
- Kafka was developed at LinkedIn.
- Kafka was originally planned to be a messaging queue, and its core is an abstraction of a distributed commit log.
- Kafka has 4 APIs: Producer, Consumer, Streams, and Connect.
- In the Cheat Sheet you will notice a common use of Zookeeper. It is already installed inside Kafka.
- Zookeeper keeps track of status of the Kafka cluster nodes and it also keeps track of Kafka topics, partitions etc.
- If you work on your local computer you will only have one Brocker and you can do only one Replication, but with many partitions.
- The founders of Kafka released a commercial version called Confluent.
- Confluent is a data streaming platform based on Apache Kafka: a full-scale streaming platform, capable of not only publish-and-subscribe, but also the storage and processing of data within the stream at a very large scale.
- There are three versions of Kafka, in this Cheat sheet we are working with Version 2.0.0, because it’s much simpler for beginners and easy to configure. However, except the installations, the cheat sheet commands are the same for all the versions
- Before you install Kafka make sure you have Java already installed with this command.
If you have an error, do these commands on Mac.
# Make sure you have an updated brew
> git -C /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core fetch --unshallow> brew update# Install Java8 because it's compatible with Kafka 2.0.0> brew install --cask adoptopenjdk8
Download Kafka 2.0.0
Install Kafka on Mac
- You can do this steps manually or with your terminal.
- The .tgz file name can differentiate.
# Move the Kafka file to your user directory
> mv Downloads/kafka_2.12-2.0.0.tgz .
> tar -xvf kafka_2.12-2.0.0.tgz
- For simplicity: Open the your user directory and change the name of this folder “kafka_2.12–2.0.0” to “kafka”.
# Read the kafka folder
> ls kafka
- You must see these folders in kafka folder
Check if Kafka works
# Start your terminal again
- If you see the documentation in your terminal, that means KAFKA IS WORKING.
- If not check the installation steps again.
Add Kafka to the Mac terminal
> nano ~/.bash_profile
- Add this line to your paths
# Kafkaexport PATH=”$PATH:$HOME/kafka/bin”
- Test Kafka again
It should work form every place in your terminal
Start Zookeeper and Kafka Servers
- Open a new two terminals
- The first one to run the Zookeeper server
# Start the Zookeeper server
> zookeeper-server-start.sh config/zookeeper.propertiesor > cd kafka
> bin/zookeeper-server-start.sh config/zookeeper.properties
- The second one to run the Kafka server
# Start the Kafka server
> kafka-server-start.sh config/server.propertiesor> cd kafka
> bin/kafka-server-start.sh config/server.properties
- Let them running and open a new terminal for the following commands
- Data in Kafka are organized and stored in topics. A topic is similar to a folder in a filesystem, and the data are the files in that folder.
- Topics are partitioned, meaning a topic is spread over a number of “buckets” located on different Kafka brokers.
- Every topic can be replicated, in order to make your data fault-tolerant and highly-available.
- Create a topic called “first_topic” with 3 partitions and 1 replication-factor.
- NOTE: if you work on your local computer you can only make 1 replication-factor, otherwise you will get an error.
# Create a topic called “first_topic”
> kafka-topics.sh --zookeeper 127.0.0.1:2181 --topic first_topic --create --partitions 3 --replication-factor 1# Describe the first topic
> kafka-topics.sh --zookeeper 127.0.0.1:2181 --topic first_topic --describe
- Create a topic called “second_topic” with 6 partitions and 1 replication-factor.
# Create a topic called “second_topic”
> kafka-topics.sh --zookeeper 127.0.0.1:2181 --topic second_topic --create --partitions 6--replication-factor 1# Describe the second topic
> kafka-topics.sh --zookeeper 127.0.0.1:2181 --topic second_topic --describe
List all the topics
- We must see two topics.
# List topics
> kafka-topics.sh --zookeeper 127.0.0.1:2181 --list
Delete a topic
- Delete the second topic.
# Delete a topic
> kafka-topics.sh --zookeeper 127.0.0.1:2181 --topic second_topic --delete
Kafka Consumer and Producer
- Consumer: read one or more topics and to process the stream of data produced to them.
- Producer: write a stream of events to one or more Kafka topics.
- Create a Kafka Consumer to read the data stream.
# Create a Kafka Consumer
> kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --topic first_topic
- Nothing will happen, because we don’t have a producer yet.
- Open a new terminal and create a Kafka Producer, in order to write the data streams.
# Create a Kafka Producer
> kafka-console-producer.sh --broker-list 127.0.0.1:9092 --topic first_topic
- Write the data streams, for instance i will write this event in the producer.
- Open the Consumer and you will find this messages as your written
- The data is stored in a the Kafka topic “first_topic”.
Retrieve the data stored in topics
- Read the stored data in a Kafka topic from the beginning
# Retrieve the data stored in topics
> kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --topic first_topic --from-beginning
Kafka Consumer Groups
- Consumers can join a group by using the same group id.
- The maximum parallelism of a group is that the number of consumers in the group = number of partitions.
Create a consumer group
- Create a consumer group called my-first-application
# Create a consumer group
> kafka-console-consumer.sh — bootstrap-server 127.0.0.1:9092 — topic first_topic — group my-first-application
List all the consumer groups
# List all the consumer groups
> kafka-consumer-groups.sh --bootstrap-server localhost:9092 --list
- We can notice that when we create a consumer and don’t specify a consumer group, Kafka generates a random consumer name for you.
- Unused Roblox Gift Card Codes Wiki (August 2022) New List
Check consumer offset and describe a consumer group
- Check the lagging messages, that not consumed yet.
- To check the lagging message you must write the consumer group
# Describe a consumer group
> kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group my-first-application
- TOPIC contains the topic name
- PARTITION contains the partition number
- CURRENT-OFFSET contains the last log number (offset)
- LAG is 0, means the consumer has read all the data