How to install Apache Kafka on CentOS 7

Introduction

Apache Kafka is a popular distributed message broker designed to effectively process large amounts of real-time data. Kafka cluster is not only highly scalable and fault-tolerant, but also has higher throughput compared with other message brokers such as ActiveMQ and RabbitMQ. Although it is usually used as a publish/subscribe messaging system, many Organizations also use it for log aggregation because it provides persistent storage for published messages.

The publish/subscribe messaging system allows one or more producers to publish messages regardless of the number of comsumers or how they will process the messages. The subscribed clients will be notified automatically about the update and the creation of new messages. This system is more efficient and scalable than a system where the client periodically polls to determine whether new messages are available.

In this tutorial, you will install and use Apache Kafka 1.1.0 on CentOS 7.

Preparation

To continue, you will need:

A CentOS 7 server and a non-root user with sudo privileges. Students who don’t have a server can buy it from here, but I personally recommend you to use the free Tencent Cloud Developer Lab for experimentation, and then buy server.
At least 4GB of RAM on the server. Installation without so much RAM may cause the Kafka service to fail, and the Java Virtual Machine (JVM) throws an "Out Of Memory" exception during startup.
Install OpenJDK 8 on your server. Kafka is written in Java, so it requires a JVM; however, its startup shell script has a version detection error, which prevents it from launching JVM versions above 8.

Step 1-Create User for Kafka

Since Kafka can handle requests over the network, you should create a dedicated user for it. If the Kafka server is compromised, this can minimize the damage to the CentOS machine. We will create a dedicated kafka user in this step, but you should create a different non-root user to perform other tasks on this server after completing Kafka setup.

sudo useradd kafka -m

The -m flag ensures that a home directory will be created for the user. This home directory /home/kafka will serve as our workspace directory for executing the commands in the following sections.

Use passwd to set a password:

sudo passwd kafka

Use the adduser command to add the kafka user to the wheel group so that it has the necessary permissions to install Kafka dependencies:

sudo usermod -aG wheel kafka

Your kafka user is now ready. Use the su method to log in to this account:

su -l kafka

Now that we have created Kafka-specific users, we can proceed to download and decompress Kafka binaries.

Step 2-Download and extract Kafka binary files

Let's download and unzip the kafka binary file into a dedicated folder in our kafka user home directory.

First, create a directory Downloads in /home/kafka to store your downloads:

mkdir ~/Downloads

Use curl to download Kafka binary files:

curl "http://www-eu.apache.org/dist/kafka/1.1.0/kafka_2.12-1.1.0.tgz"-o ~/Downloads/kafka.tgz

Create a directory called kafka and change to this directory. This will be the base directory for Kafka installation:

mkdir ~/kafka && cd ~/kafka

Use the tar command to extract the downloaded archive:

tar -xvzf ~/Downloads/kafka.tgz --strip 1

We specify the --strip 1 flag to ensure that the content of the archive ~/kafka/ itself is extracted and not in another directory inside it (for example, ~/kafka/kafka_2.12-1.1.0/) extract.

Now that we have successfully downloaded and decompressed the binary file, we can continue to configure Kafka to allow the topic to be deleted.

Step 3-Configure Kafka server

The default behavior of Kafka will not allow us to delete the topic, category, group or feed name that can publish messages. To modify it, let's edit the configuration file.

The configuration options of Kafka are specified in server.properties. Open this file with vi or your favorite editor:

vi ~/kafka/config/server.properties

Let's add a setting that allows us to delete Kafka topics. Press i to insert text, and add the following to the bottom of the file:

delete.topic.enable =true

When finished, press ESC to exit insert mode and press :wq to write the changes to the file and exit. Now that we have configured Kafka, we can proceed to create the systemd unit file to run and enable it at startup.

Step 4-Create system unit file and start Kafka server

In this section, we will create a systemd unit file for the Kafka service. This will help us perform common service operations, such as starting, stopping and restarting Kafka in a manner consistent with other Linux services.

Zookeeper is a service used by Kafka to manage its cluster status and configuration. It is usually used as an indispensable component in many distributed systems. If you want to learn more, please visit the official Zookeeper document.

Create a unit file for zookeeper:

sudo vi /etc/systemd/system/zookeeper.service

Enter the following unit definition in the file:

[ Unit]
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[ Service]
Type=simple
User=kafka
ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties
ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[ Install]
WantedBy=multi-user.target

[ The Unit] part specifies that Zookeeper needs a network and the file system is ready before starting.

The [Service] section specifies that systemd should use the zookeeper-server-start.sh and zookeeper-server-stop.sh shell files to start and stop the service. It also specifies that Zookeeper should automatically restart if it exits abnormally.

Next, create a systemd service file for kafka:

sudo vi /etc/systemd/system/kafka.service

Enter the following unit definition in the file:

[ Unit]
Requires=zookeeper.service
After=zookeeper.service

[ Service]
Type=simple
User=kafka
ExecStart=/bin/sh -c '/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1'
ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal

[ Install]
WantedBy=multi-user.target

The [Unit] section specifies that this unit file depends on zookeeper.service. This will ensure that zookeeper is automatically started when the kafa service starts.

The [Service] section specifies that systemd should use the kafka-server-start.sh and kafka-server-stop.sh shell files to start and stop the service. It also specifies that Kafka should automatically restart if it exits abnormally.

Now that the unit has been defined, use the following command to start Kafka:

sudo systemctl start kafka

To ensure that the server has started successfully, check the logs of the kafka unit:

journalctl -u kafka

You should see output similar to the following:

Jul 1718:38:59 kafka-centos systemd[1]: Started kafka.service.

You now have a Kafka server listening on port 9092.

Although we have started the kafka service, if we restart the server, it will not start automatically. To enable kafka when the server starts, run:

sudo systemctl enable kafka

Now that we have started and enabled the service, let's check the installation.

Step 5-Test Installation

Let us publish and use the "Hello World" message to ensure that the Kafka server is running properly. Publishing messages in Kafka requires:

A producer that can publish records and data to topics.
A consumer, which reads topic messages and data.

First, create a topic TutorialTopic by typing the following name:

~ /kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181--replication-factor 1--partitions 1--topic TutorialTopic

You can use the kafka-console-producer.sh script to create a generator from the command line. It expects the hostname, port and topic name of the Kafka server as parameters.

Type the following to publish the string "Hello, World" to the TutorialTopic topic:

echo "Hello, World"|~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092--topic TutorialTopic >/dev/null

Next, you can use the kafka-console-consumer.sh script to create Kafka consumers. It expects the host name and port of the ZooKeeper server, and the theme name as parameters.

The following command uses the message from TutorialTopic. Please note the use of the --from-beginning flag, which allows consumption of messages published before the comsumer starts:

~ /kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092--topic TutorialTopic --from-beginning

If there is no configuration problem, you should see Hello, World in the terminal:

Hello, World

The script will continue to run, waiting for more messages to be posted to the topic. Feel free to open a new terminal and start a producer to post more messages. You should be able to see them in the output of comsumer.

After completing the test, press CTRL+C to stop the user script. Now that we have tested the installation, let's proceed to install KafkaT.

Step 6-Install KafkaT (optional)

KafkaT is a tool from Airbnb that allows you to more easily view detailed information about Kafka clusters and perform certain management tasks from the command line. Because it is a Ruby gem, you need Ruby to use it. You also need ruby-devel and build-related packages (such as make and gcc) to build other gems it depends on. Install them using yum:

sudo yum install ruby ruby-devel make gcc patch

You can now install KafkaT using the gem command:

sudo gem install kafkat

KafkaT uses .kafkatcfg as a configuration file to determine the installation and log directory of the Kafka server. It should also have an entry pointing KafkaT to your ZooKeeper instance.

Create a new file named .kafkatcfg:

vi ~/.kafkatcfg

Add the following lines to specify the required information about the Kafka server and Zookeeper instance:

{" kafka_path":"~/kafka","log_path":"/tmp/kafka-logs","zk_path":"localhost:2181"}

You can now use KafkaT. First, you can use it to view detailed information about all Kafka partitions:

kafkat partitions

You will see the following output:

Topic                 Partition   Leader      Replicas        ISRs    
TutorialTopic         00[0][0]
__ consumer_offsets    00[0][0]......

You will see TutorialTopic and __consumer_offsets, which are internal topics of Kafka used to store client-related information. You can safely ignore the lines starting with __consumer_offsets.

Step 7-Set up a multi-node cluster (optional)

If you want to use more CentOS 7 computers to create a multi-agent cluster, you should repeat step 1, step 4, and step 5 on each new computer. In addition, you should make the following changes in the server.properties file:

The value of the broker.id property should be changed to make it unique in the entire cluster. This attribute uniquely identifies each server in the cluster, and any string can be used as its value. For example, "server1", "server2", etc.
The value of the zookeeper.connect property should be changed so that all nodes point to the same ZooKeeper instance. This attribute specifies the address of the Zookeeper instance and follows the <HOSTNAME/IP_ADDRESS> :<PORT> Format. For example, "203.0.113.0:2181", "203.0.113.1:2181" and so on.

If you want to set up multiple ZooKeeper instances for the cluster, the zookeeper.connect attribute value on each node should be the same comma-separated string, which lists the IP addresses and port numbers of all ZooKeeper instances.

Step 8-Restrict Kafka users

Now that all installations are complete, you can delete the administrator privileges of the kafka user. Before doing this, log out and log back in as any other non-root sudo user. If you are still running the same shell session, then just type exit to start this tutorial.

Delete the kafka user from the sudo group:

sudo gpasswd -d kafka wheel

To further improve the security of the Kafka server, use this command to lock the password passwd of the kafka user. This ensures that no one can log into the server directly with this account:

sudo passwd kafka -l

At this time, only root or sudo users can log in by typing the kafka command:

sudo su - kafka

In the future, if you want to unlock, please use the -u option of passwd:

sudo passwd kafka -u

You have now successfully restricted the admin rights of the kafka user.

in conclusion

You can now safely run Apache Kafka on CentOS server. You can use Kafka client (available for most programming languages) to create Kafka producers and consumers to use it in your project.

To learn more about the installation of Apache Kafka related tutorials, please go to [Tencent Cloud + Community] (https://cloud.tencent.com/developer?from=10680) to learn more.

Reference: "How To Install Apache Kafka on CentOS 7"