Apache Kafka is an open-source stream-processing software platform developed by the Apache Software Foundation, written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for real-time handling data feeds. Kafka is highly valuable for enterprise infrastructures to process streaming data. Kafka is used for two broad classes of applications:

  • Building real-time streaming data pipelines that reliably get data between systems or applications
  • Building real-time streaming applications that transform or react to the streams of data.

You can read more on Apache Kafka here

In this guide, we will explain how to install Apache Kafka on a Debian 9 server.

Deploying your cloud server
If you have not already registered with Cloudwafer, you should begin by getting signed up. Take a moment to create an account after which you can quickly deploy your cloud servers.

Once you have signed up, log into your Cloudwafer Client Area and deploy your Cloudwafer cloud server.

You can also read our guide on Installing Apache Kafka on CentOS 7 for more information on the CentOS 7 Installation.

Updating System Packages
It is recommended that you update the system to the latest packages before beginning any major installations. Issue the command below:

sudo apt-get update && sudo apt-get upgrade

Step 1: Create a User for Kafka
Kafka handles requests over a network. Hence it's advisable to create a dedicated user for it. To create a user, type the commands below while logged in as a non-root sudo user:

sudo useradd kafka -m

Note: The -m flag makes sure that a home directory is created for the newly created user. This home directory, /home/kafka, will serve as our workspace directory for executing commands.

Add a password to the Kafka user using passwd:

sudo passwd kafka

Next, we need to add the kafka user to the sudo group for it to have sudo privileges that will be required to install Kafka's dependencies:

sudo adduser kafka sudo

You can now log in to the account, as shown below:

su -l kafka

Step 2: Download and Extract the Kafka Binaries

Before proceeding, it is advisable to create a directory in /home/kafka called Download to store our downloads:

mkdir ~/Downloads

Next, we proceed to download the Kafka Binaries by using curl as shown below:

curl "https://www.apache.org/dist/kafka/2.1.1/kafka_2.11-2.1.1.tgz" -o ~/Downloads/kafka.tgz

Next, we create a directory called kafka which will serve as the base directory of the Kafka installation.

mkdir ~/kafka

Change our working directory to this directory:

cd ~/kafka

Extract the archive we downloaded using the tar command:

tar -xvzf ~/Downloads/kafka.tgz --strip 1

Note: The --strip 1 flag makes sure that the archive's contents are extracted in ~/kafka/ itself and not in another directory inside of it.

At this step, we have successfully downloaded and extracted the Kafka binaries successfully.

Step 3: Configure the Kafka Server
Kafka's default configuration will not allow us to delete a topic, the category, group, or feed name to which messages can be published. To modify this, we will edit the configuration file.

Kafka's configuration options are specified in the server.properties file. Open this file with nano or your favourite editor:

nano ~/kafka/config/server.properties

Add the following line to the bottom of the file to allow us to delete Kafka topics.

delete.topic.enable = true

Save and close the file.

Step 4: Create Systemd Unit Files and Starting the Kafka Server
To perform everyday service actions such as starting, stopping, and restarting Kafka in a manner consistent with other Linux services, we will create systemd unit files for the Kafka service.

Kafka uses ZooKeeper, so you need first to start a ZooKeeper server if you don't already have one. You can read more on Zookeeper here

First, we need to create the unit file for zookeeper:

nano /etc/systemd/system/zookeeper.service

Enter the following unit definition into the file:

[Unit]
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
User=kafka
ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties
ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Save and close the file.

The [Unit] section specifies that Zookeeper requires networking and the filesystem to be ready before it can start.

The [Service] section specifies that systemd should use the zookeeper-server-start.sh and zookeeper-server-stop.sh shell files for starting and stopping the service. It also specifies that Zookeeper should be restarted automatically if it exits abnormally.

Next, create the systemd service file for kafka:

sudo nano /etc/systemd/system/kafka.service

Enter the following unit definition into the file:

[Unit]
Requires=zookeeper.service
After=zookeeper.service

[Service]
Type=simple
User=kafka
ExecStart=/bin/sh -c '/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1'
ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

The [Unit] section specifies that this unit file depends on zookeeper.service. This will ensure that zookeeper gets started automatically when the kafka service starts.

The [Service] section specifies that systemd should use the kafka-server-start.sh and kafka-server-stop.sh Shell files for starting and stopping the service. It also specifies that Kafka should be restarted automatically if it exits abnormally.

Next, start Kafka with the command below:

sudo systemctl start kafka

You can check the journal logs for the kafka unit to ensure that the server has started successfully by issuing the command below:

journalctl -u kafka

The Kafka server listens on port 9092.

To enable Kafka on server boot, issue the command below:

sudo systemctl enable kafka

Step 5: Testing Our Installation
To test our Kafka installation, we will publish and consume a sample message to make sure the Kafka server is behaving correctly. Publishing messages in Kafka require:

  • A producer, which enables the publication of records and data to topics.
  • A consumer, which reads messages and data from topics.

First, create a topic named SampleTopic by typing:

~/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic SampleTopic

Next, create a producer from the command line using the kafka-console-producer.sh script. It expects the Kafka server's hostname, port, and a topic name as arguments.

Publish the string "Hello World. Apache Kafka by Cloudwaferlabs " to the SampleTopic topic by typing:

echo "Hello World. Apache Kafka by Cloudwaferlabs " | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic SampleTopic > /dev/null

Next, create a Kafka consumer using the kafka-console-consumer.sh script. It expects the ZooKeeper server's hostname and port, along with a topic name as arguments.

The following command consumes messages from SampleTopic. Note the use of the --from-beginning flag, which allows the consumption of messages that were published before the consumer was started:

~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic SampleTopic --from-beginning

If there are no configuration issues, you should our chosen string in the terminal, as shown below:

The script will continue to run, waiting for more messages to be published to the topic. Press CTRL+C to stop the consumer script.