How to install Apache Kafka on Ubuntu 18.04

Introduction

Apache Kafka is a popular distributed message broker designed to effectively process large amounts of real-time data. Kafka cluster is not only highly scalable and fault-tolerant, but also has higher throughput compared with other message brokers such as ActiveMQ and RabbitMQ). Although it is commonly used as a publish/subscribe messaging system, many organizations also use it for log aggregation because it provides persistent storage for published messages.

The publish/subscribe messaging system allows one or more producers to publish messages regardless of the number of consumers or how they will process the messages. The subscribed clients will be notified automatically about the update and the creation of new messages. This system is more efficient and scalable than a system where the client periodically polls to determine whether new messages are available.

In this tutorial, you will install and use Apache Kafka 1.1.0 on Ubuntu 18.04.

Course Preparation

To continue, you will need:

An Ubuntu 18.04 server and a non-root user with sudo privileges. Students who don’t have a server can buy it from here, but I personally recommend you to use the free Tencent Cloud Developer Lab for experimentation, and then buy server.
At least 4GB of RAM on the server. Installation without so much RAM may cause the Kafka service to fail. Java Virtual Machine (JVM) throws an "Out Of Memory" exception during startup.
OpenJDK 8 is installed on your server. Kafka is written in Java, so it requires a JVM; however, its startup shell script has a version detection error, which prevents it from launching JVM versions above 8.

Step 1-Create User for Kafka

Since Kafka can handle requests over the network, you should create a dedicated user for it. If the Kafka server is compromised, this can minimize the damage to the Ubuntu computer. We will create a dedicated kafka user in this step, but you should create a different non-root user to perform other tasks on this server after completing Kafka setup.

sudo useradd kafka -m

The -m flag ensures that a home directory will be created for the user. This /home/kafka home directory will serve as our workspace directory for executing the commands in the following sections.

Use passwd to set passwords for:

sudo passwd kafka

Use this command to add the kafka user to the sudo group adduser so that it has the necessary permissions to install Kafka dependencies:

sudo adduser kafka sudo

Your kafka user is now ready. Use the following su method to log in to this account:

su -l kafka

Now that we have created Kafka-specific users, we can proceed to download and decompress Kafka binaries.

Step 2-Download and extract Kafka binary files

Let's download and unzip the kafka binary file into a dedicated folder in our kafka user home directory.

First, create a directory named Downloads in /home/kafka to store your downloads:

mkdir ~/Downloads

Use curl to download Kafka's binary files:

curl "http://www-eu.apache.org/dist/kafka/1.1.0/kafka_2.12-1.1.0.tgz"-o ~/Downloads/kafka.tgz

Create a directory called kafka and change to this directory. This will be the base directory for Kafka installation:

mkdir ~/kafka && cd ~/kafka

Use the following tar command to extract the downloaded archive:

tar -xvzf ~/Downloads/kafka.tgz --strip 1

We specify the --strip 1 flag to ensure that the content of the archive ~/kafka/ itself is extracted and not in another directory inside it (for example, ~/kafka/kafka_2.12-1.1.0/) extract.

Now that we have successfully downloaded and decompressed the binary file, we can continue to configure Kafka to allow the topic to be deleted.

Step 3-Configure Kafka server

The default behavior of Kafka will not allow us to delete the topic, category, group or feed name that can publish messages. To modify it, let's edit the configuration file.

The configuration options of Kafka are specified in server.properties. Open this file with nano or another editor of your choice:

nano ~/kafka/config/server.properties

Let's add a setting that allows us to delete Kafka topics. Add the following to the bottom of the file:

delete.topic.enable =true

Save the file and exit nano. Now that we have configured Kafka, we can proceed to create the systemd unit file to run and enable it at startup.

Step 4-Create system unit file and start Kafka server

In this section, we will create a [systemd unit file] (https://www.digitalocean.com/community/tutorials/understanding-systemd-units-and-unit-files) for the Kafka service. This will help us perform common service operations, such as starting, stopping and restarting Kafka in a manner consistent with other Linux services.

Zookeeper is a service used by Kafka to manage its cluster status and configuration. It is usually used as an indispensable component in many distributed systems. If you want to learn more, please visit the official Zookeeper document.

Create a unit file for zookeeper:

sudo nano /etc/systemd/system/zookeeper.service

Enter the following unit definition in the file:

[ Unit]
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[ Service]
Type=simple
User=kafka
ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties
ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[ Install]
WantedBy=multi-user.target

The [Unit] part specifies that Zookeeper needs a network and the file system is ready before starting.

The [Service] section specifies that systemd should use the zookeeper-server-start.sh and zookeeper-server-stop.sh shell files to start and stop the service. It also specifies that Zookeeper should automatically restart if it exits abnormally.

Next, create a systemd service file for the following kafka content:

sudo nano /etc/systemd/system/kafka.service

Enter the following unit definition in the file:

[ Unit]
Requires=zookeeper.service
After=zookeeper.service

[ Service]
Type=simple
User=kafka
ExecStart=/bin/sh -c '/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1'
ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal

[ Install]
WantedBy=multi-user.target

The [Unit] section specifies that this unit file depends on zookeeper.service. This will ensure that zookeeper is automatically started when the kafa service starts.

The [Service] section specifies that systemd should use the kafka-server-start.sh and kafka-server-stop.sh shell files to start and stop the service. It also specifies that Kafka should automatically restart if it exits abnormally.

Now that the unit has been defined, use the following command to start Kafka:

sudo systemctl start kafka

To make sure that the server has started successfully, check the log of the device kafka:

journalctl -u kafka

You should see output similar to the following:

Jul 1718:38:59 kafka-ubuntu systemd[1]: Started kafka.service.

You now have a Kafka server listening on port 9092.

Although we have started the kafka service, if we restart the server, it will not start automatically. To enable kafka when the server starts, run:

sudo systemctl enable kafka

Now that we have started and enabled the service, let's check the installation.

Step 5-Test Installation

Let us publish and use the "Hello World" message to ensure that the Kafka server is running properly. Publishing messages in Kafka requires:

A producer, which enables the records and data to be published on the subject.
A consumer whose content consists of topic messages and data.

First, create a TutorialTopic theme by typing the following name:

~ /kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181--replication-factor 1--partitions 1--topic TutorialTopic

You can use the kafka-console-producer.sh script to create a generator from the command line. It expects the hostname, port and topic name of the Kafka server as parameters.

Type the following to publish the string "Hello, World" to the TutorialTopic topic:

echo "Hello, World"|~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092--topic TutorialTopic >/dev/null

Next, you can use the kafka-console-consumer.sh script to create Kafka consumers. It expects the host name and port of the ZooKeeper server, and the theme name as parameters.

The following command uses the message from TutorialTopic. Please note the use of the --from-beginning flag, which allows the consumption of messages published before the consumer starts:

~ /kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092--topic TutorialTopic --from-beginning

If there is no configuration problem, you should see Hello, World in the terminal:

Hello, World

The script will continue to run, waiting for more messages to be posted to the topic. Feel free to open a new terminal and start a producer to post more messages. You should be able to see them in the consumer's output.

After completing the test, press CTRL+C to stop the user script. Now that we have tested the installation, let's proceed to install KafkaT.

Step 6-Install KafkaT (optional)

KafkaT is a tool from Airbnb that allows you to more easily view detailed information about Kafka clusters and perform certain management tasks from the command line. Because it is a Ruby gem, you need Ruby to use it. You also need this build-essential package to build other gems it depends on. Install with apt:

sudo apt install ruby ruby-dev build-essential

You can now install KafkaT using the gem command:

sudo gem install kafkat

KafkaT uses the .kafkatcfg configuration file to determine the installation and log directory of the Kafka server. It should also have an entry pointing KafkaT to your ZooKeeper instance.

Create a new file named .kafkatcfg:

nano ~/.kafkatcfg

Add the following lines to specify the required information about the Kafka server and Zookeeper instance:

{" kafka_path":"~/kafka","log_path":"/tmp/kafka-logs","zk_path":"localhost:2181"}

You can now use KafkaT. First, you can use it to view detailed information about all Kafka partitions:

kafkat partitions

You will see the following output:

Topic                 Partition   Leader      Replicas        ISRs    
TutorialTopic         00[0][0]
__ consumer_offsets    00[0][0]......

You will see TutorialTopic, and __consumer_offsets which are internal topics of Kafka used to store client-related information. You can safely ignore the lines starting with __consumer_offsets.

To learn more about KafkaT, please refer to its GitHub Repository.

Step 7-Set up a multi-node cluster (optional)

If you want to use more Ubuntu 18.04 computers to create a multi-agent cluster, you should repeat step 1, step 4, and step 5 on each new computer. In addition, you should make each change in the server.properties file:

The value of the attribute of broker.id should be changed to make it unique in the entire cluster. This attribute uniquely identifies each server in the cluster, and any string can be used as its value. For example, "server1", "server2", etc.
The value of the property of zookeeper.connect should be changed so that all nodes point to the same ZooKeeper instance. This attribute specifies the address of the Zookeeper instance and follows the <HOSTNAME/IP_ADDRESS> :<PORT> Format. For example, "203.0.113.0:2181", ""203.0.113.1:2181" and so on.

If you want to set up multiple ZooKeeper instances for the cluster, the value of the zookeeper.connect property on each node should be the same comma-separated string, which lists the IP addresses and port numbers of all ZooKeeper instances.

Step 8-Restrict Kafka users

Now that all installations are complete, you can delete the administrator privileges of the kafka user. Before doing this, log out and log back in as any other non-root sudo user. If you are still running the same shell session, then just type exit to start this tutorial.

Delete the kafka user from the sudo group:

sudo deluser kafka sudo

To further improve the security of the Kafka server, use this command to lock the password passwd of the kafka user. This ensures that no one can log into the server directly with this account:

sudo passwd kafka -l

At this time, only root or sudo users can log in as kafka by typing the following command:

sudo su - kafka

In the future, if you want to unlock, please use the following -u option of passwd:

sudo passwd kafka -u

You have now successfully restricted the admin rights of the kafka user.

in conclusion

You can now safely run Apache Kafka on the Ubuntu server. You can use Kafka client (available for most programming languages) to create Kafka producers and consumers to use it in your project. To learn more about Kafka, you can also refer to its document.

For more Ubuntu tutorials, please go to [Tencent Cloud + Community] (https://cloud.tencent.com/developer?from=10680) to learn more.

Reference: "How To Install Apache Kafka on Ubuntu 18.04"