Apache Kafka is a popular distributed message broker designed to effectively process large amounts of real-time data. Kafka cluster is not only highly scalable and fault-tolerant, but also has higher throughput compared with other message brokers such as ActiveMQ and RabbitMQ). Although it is commonly used as a publish/subscribe messaging system, many organizations also use it for log aggregation because it provides persistent storage for published messages.
The publish/subscribe messaging system allows one or more producers to publish messages regardless of the number of consumers or how they will process the messages. The subscribed clients will be notified automatically about the update and the creation of new messages. This system is more efficient and scalable than a system where the client periodically polls to determine whether new messages are available.
In this tutorial, you will install and use Apache Kafka 1.1.0 on Ubuntu 18.04.
To continue, you will need:
Since Kafka can handle requests over the network, you should create a dedicated user for it. If the Kafka server is compromised, this can minimize the damage to the Ubuntu computer. We will create a dedicated kafka user in this step, but you should create a different non-root user to perform other tasks on this server after completing Kafka setup.
Log in as a non-root sudo user and use the following useradd
command to create a user named kafka:
sudo useradd kafka -m
The -m
flag ensures that a home directory will be created for the user. This /home/kafka
home directory will serve as our workspace directory for executing the commands in the following sections.
Use passwd
to set passwords for:
sudo passwd kafka
Use this command to add the kafka user to the sudo
group adduser
so that it has the necessary permissions to install Kafka dependencies:
sudo adduser kafka sudo
Your kafka user is now ready. Use the following su
method to log in to this account:
su -l kafka
Now that we have created Kafka-specific users, we can proceed to download and decompress Kafka binaries.
Let's download and unzip the kafka binary file into a dedicated folder in our kafka user home directory.
First, create a directory named Downloads
in /home/kafka
to store your downloads:
mkdir ~/Downloads
Use curl
to download Kafka's binary files:
curl "http://www-eu.apache.org/dist/kafka/1.1.0/kafka_2.12-1.1.0.tgz"-o ~/Downloads/kafka.tgz
Create a directory called kafka
and change to this directory. This will be the base directory for Kafka installation:
mkdir ~/kafka && cd ~/kafka
Use the following tar
command to extract the downloaded archive:
tar -xvzf ~/Downloads/kafka.tgz --strip 1
We specify the --strip 1
flag to ensure that the content of the archive ~/kafka/
itself is extracted and not in another directory inside it (for example, ~/kafka/kafka_2.12-1.1.0/
) extract.
Now that we have successfully downloaded and decompressed the binary file, we can continue to configure Kafka to allow the topic to be deleted.
The default behavior of Kafka will not allow us to delete the topic, category, group or feed name that can publish messages. To modify it, let's edit the configuration file.
The configuration options of Kafka are specified in server.properties
. Open this file with nano
or another editor of your choice:
nano ~/kafka/config/server.properties
Let's add a setting that allows us to delete Kafka topics. Add the following to the bottom of the file:
delete.topic.enable =true
Save the file and exit nano
. Now that we have configured Kafka, we can proceed to create the systemd unit file to run and enable it at startup.
In this section, we will create a [systemd unit file] (https://www.digitalocean.com/community/tutorials/understanding-systemd-units-and-unit-files) for the Kafka service. This will help us perform common service operations, such as starting, stopping and restarting Kafka in a manner consistent with other Linux services.
Zookeeper is a service used by Kafka to manage its cluster status and configuration. It is usually used as an indispensable component in many distributed systems. If you want to learn more, please visit the official Zookeeper document.
Create a unit file for zookeeper
:
sudo nano /etc/systemd/system/zookeeper.service
Enter the following unit definition in the file:
[ Unit]
Requires=network.target remote-fs.target
After=network.target remote-fs.target
[ Service]
Type=simple
User=kafka
ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties
ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal
[ Install]
WantedBy=multi-user.target
The [Unit]
part specifies that Zookeeper needs a network and the file system is ready before starting.
The [Service]
section specifies that systemd should use the zookeeper-server-start.sh
and zookeeper-server-stop.sh
shell files to start and stop the service. It also specifies that Zookeeper should automatically restart if it exits abnormally.
Next, create a systemd service file for the following kafka
content:
sudo nano /etc/systemd/system/kafka.service
Enter the following unit definition in the file:
[ Unit]
Requires=zookeeper.service
After=zookeeper.service
[ Service]
Type=simple
User=kafka
ExecStart=/bin/sh -c '/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1'
ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal
[ Install]
WantedBy=multi-user.target
The [Unit]
section specifies that this unit file depends on zookeeper.service
. This will ensure that zookeeper
is automatically started when the kafa
service starts.
The [Service]
section specifies that systemd should use the kafka-server-start.sh
and kafka-server-stop.sh
shell files to start and stop the service. It also specifies that Kafka should automatically restart if it exits abnormally.
Now that the unit has been defined, use the following command to start Kafka:
sudo systemctl start kafka
To make sure that the server has started successfully, check the log of the device kafka
:
journalctl -u kafka
You should see output similar to the following:
Jul 1718:38:59 kafka-ubuntu systemd[1]: Started kafka.service.
You now have a Kafka server listening on port 9092
.
Although we have started the kafka
service, if we restart the server, it will not start automatically. To enable kafka
when the server starts, run:
sudo systemctl enable kafka
Now that we have started and enabled the service, let's check the installation.
Let us publish and use the "Hello World" message to ensure that the Kafka server is running properly. Publishing messages in Kafka requires:
First, create a TutorialTopic
theme by typing the following name:
~ /kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181--replication-factor 1--partitions 1--topic TutorialTopic
You can use the kafka-console-producer.sh
script to create a generator from the command line. It expects the hostname, port and topic name of the Kafka server as parameters.
Type the following to publish the string "Hello, World"
to the TutorialTopic
topic:
echo "Hello, World"|~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092--topic TutorialTopic >/dev/null
Next, you can use the kafka-console-consumer.sh
script to create Kafka consumers. It expects the host name and port of the ZooKeeper server, and the theme name as parameters.
The following command uses the message from TutorialTopic
. Please note the use of the --from-beginning
flag, which allows the consumption of messages published before the consumer starts:
~ /kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092--topic TutorialTopic --from-beginning
If there is no configuration problem, you should see Hello, World
in the terminal:
Hello, World
The script will continue to run, waiting for more messages to be posted to the topic. Feel free to open a new terminal and start a producer to post more messages. You should be able to see them in the consumer's output.
After completing the test, press CTRL+C
to stop the user script. Now that we have tested the installation, let's proceed to install KafkaT.
KafkaT is a tool from Airbnb that allows you to more easily view detailed information about Kafka clusters and perform certain management tasks from the command line. Because it is a Ruby gem, you need Ruby to use it. You also need this build-essential
package to build other gems it depends on. Install with apt
:
sudo apt install ruby ruby-dev build-essential
You can now install KafkaT using the gem command:
sudo gem install kafkat
KafkaT uses the .kafkatcfg
configuration file to determine the installation and log directory of the Kafka server. It should also have an entry pointing KafkaT to your ZooKeeper instance.
Create a new file named .kafkatcfg
:
nano ~/.kafkatcfg
Add the following lines to specify the required information about the Kafka server and Zookeeper instance:
{" kafka_path":"~/kafka","log_path":"/tmp/kafka-logs","zk_path":"localhost:2181"}
You can now use KafkaT. First, you can use it to view detailed information about all Kafka partitions:
kafkat partitions
You will see the following output:
Topic Partition Leader Replicas ISRs
TutorialTopic 00[0][0]
__ consumer_offsets 00[0][0]......
You will see TutorialTopic
, and __consumer_offsets
which are internal topics of Kafka used to store client-related information. You can safely ignore the lines starting with __consumer_offsets
.
To learn more about KafkaT, please refer to its GitHub Repository.
If you want to use more Ubuntu 18.04 computers to create a multi-agent cluster, you should repeat step 1, step 4, and step 5 on each new computer. In addition, you should make each change in the server.properties
file:
broker.id
should be changed to make it unique in the entire cluster. This attribute uniquely identifies each server in the cluster, and any string can be used as its value. For example, "server1"
, "server2"
, etc.zookeeper.connect
should be changed so that all nodes point to the same ZooKeeper instance. This attribute specifies the address of the Zookeeper instance and follows the <HOSTNAME/IP_ADDRESS> :<PORT>
Format. For example, "203.0.113.0:2181", ""203.0.113.1:2181"
and so on.If you want to set up multiple ZooKeeper instances for the cluster, the value of the zookeeper.connect
property on each node should be the same comma-separated string, which lists the IP addresses and port numbers of all ZooKeeper instances.
Now that all installations are complete, you can delete the administrator privileges of the kafka user. Before doing this, log out and log back in as any other non-root sudo user. If you are still running the same shell session, then just type exit
to start this tutorial.
Delete the kafka user from the sudo group:
sudo deluser kafka sudo
To further improve the security of the Kafka server, use this command to lock the password passwd
of the kafka user. This ensures that no one can log into the server directly with this account:
sudo passwd kafka -l
At this time, only root or sudo users can log in as kafka
by typing the following command:
sudo su - kafka
In the future, if you want to unlock, please use the following -u
option of passwd
:
sudo passwd kafka -u
You have now successfully restricted the admin rights of the kafka user.
You can now safely run Apache Kafka on the Ubuntu server. You can use Kafka client (available for most programming languages) to create Kafka producers and consumers to use it in your project. To learn more about Kafka, you can also refer to its document.
For more Ubuntu tutorials, please go to [Tencent Cloud + Community] (https://cloud.tencent.com/developer?from=10680) to learn more.
Reference: "How To Install Apache Kafka on Ubuntu 18.04"
Recommended Posts