Apache Kafka is an open source, distributed, high-throughput publish-subscribe messaging system.
If you are approaching kafka for the first time then this post to help you get running distributed kafka cluster on your system with minimal steps. In this guide, we will discuss steps to setup kafka on Ubuntu 16.04.
The basic architecture of Kafka is organized around a few key terms:.
Zookeeper: a coordinator between brokers and clusters.
Topic: a topic is a category to which messages are published by the message producers.
Brokers: broker instance can handle reads and writes message.
Producers: that insert data into the cluster.
Consumers: that read data from the cluster
Step 1: Install Java
Kafka needs a java runtime environment:
$sudo apt-get update
$sudo apt-get install default-jre
Step 2: Install Zookeeper
Zookeeper is a key value store used to maintain server state. This is mandatory to run the kafka.
It’s a centralized system for maintaining the configuration. It also does a job to elect the leaders.
$sudo apt-get install zookeeperd
Let’s check if this is alive or not.
$telnet localhost 2181
At prompt, enter this
ruok
if everything is okay then telnet session will reply this,
imok
Step 3: Create a service User for kafka
Kakfa is a network application, creating a non sudo user will minimize the risk of machine compromised. Let’s create a user of it name it “kafka”
$sudo adduser --system --no-create-home --disabled-password --disabled-login kafka
Step 4: Installing kafka
Download the kafka and unzip in a convenient location typically, /opt
$cd ~
$wget http://www-eu.apache.org/dist/kafka/1.1.0/kafka_2.11-1.1.0.tgz
$sudo mkdir /opt/kafka
$sudo tar -xvzf kafka_2.12-1.0.1.tgz --directory /opt/kafka --strip-components 1
Step 5: Configure the kafka server
As, kafka stores the data on disk, we will create a directory for it.
$sudo mkdir /var/lib/kakfa
$sudo mkdir /var/lib/kafka/data
Since we will be setup the distributed setup for kafka, let’s configure the 3 brokers.
If you open the /opt/kafka/confit/server.properties you will see many properties, BUT we will be dealing with only 3 properties. These three properties must be unique for each instance.
broker.id=0
listeners=PLAINTEXT://:9092
log.dirs=/tmp/kafka-logs
As we have 3 brokers, we will create properties file for each broker. Let’s copy the /opt/kafka/config/server.properties file and create 3 files for each instance.
$cp opt/kafka/config/server.properties opt/kafka/config/server-1.properties
$cp opt/kafka/config/server.properties opt/kafka/config/server-2.properties
$cp opt/kafka/config/server.properties opt/kafka/config/server-3.properties
Create the log directories for each server.
$sudo mkdir /var/lib/kakfa/data/server-1
$sudo mkdir /var/lib/kakfa/data/server-2
$sudo mkdir /var/lib/kakfa/data/server-3
We will be using these directories in configuration.
Now, make some configuration changes in each kafka server. Open this file in text editor. I am using nano.
server-1.properties
$sudo nano /opt/kafka/config/server-1.properties
broker.id=1
listeners=PLAINTEXT://:9093
log.dirs=/var/lib/kakfa/data/server-1
Save the changes and go to next server to edit.
server-2.properties
$sudo nano /opt/kafka/config/server-2.properties
broker.id=2
listeners=PLAINTEXT://:9094
log.dirs=/var/lib/kakfa/data/server-2
server-3.properties
$sudo nano /opt/kafka/config/server-3.properties
broker.id=3
listeners=PLAINTEXT://:9095
log.dirs=/var/lib/kakfa/data/server-3
If you would like to delete the topics then you need to make edits to delete.topic.enable setting. By default, kafka doesn’t allow you to delete it. It needs to enable in configuration to do it. Please find the line and change it.
delete.topic.enable = true
Step 6: confirm permission of kafka directories
We will assign permission to kafka user(created in step 3) to kafka directories.
$sudo chown -R kafka:nogroup /opt/kafka
$sudo chown -R kafka:nogroup /var/lib/kafka
Step 7: Start the brokers
Now, we can start our brokers. Run these three commands on different terminal sessions.
$cd /opt/kafka
$bin/kafka-server-start.sh config/server-1.properties
$bin/kafka-server-start.sh config/server-2.properties
$bin/kafka-server-start.sh config/server-3.properties
You should see a startup message when the brokers start successfully.
Test the installation
Create a topic
We need to create a topic first.
$bin/kafka-topics.sh --create --topic topic-1 --zookeeper localhost:2181 --partitions 3 --replication-factor 3
You should see a confirmation message after you create a topic.
partition allows how many brokers you want data to be split. As, we have 3 brokers, we can set this to 3.
replication factor allows how many copies of data you need. This is helpful when any broker down other brokers can handle the job.
The Producer instance
Producer feeds the data into the kafka clusters. This command will push the data into the cluster.
$bin/kafka-console-producer.sh --broker-list localhost:9093,localhost:9094,localhost:9095 –topic topic-1
broker-list options have the list of brokers which we have configured.
topic options specify under which topic you want to push the data. In our case we’ve pushed the data under topic-1
Once you execute this command, you will see a prompt where you can enter a message. You need to hit the enter every time to create a new message.
Consumers
We’ve produced the message. Now let’s consume those messages. Run this command to consume the messages.
$bin/kafka-console-consumer.sh --bootstrap-server localhost:9093 --topic topic-1 --from-beginning
bootstrap-server is the broker which we have created. It could be any from our 3 brokers.
from-beginning specifies read the message from the beginning.
This command shows all the message which has been produced by the producer. You can also see the message anyone adding any message. This is possible if you are logged in a separate terminal.
Hope this helps your setup and configure kafka on Ubuntu 16.04. Please try it and experiment.