How to install and configure Elasticsearch on CentOS 7

Introduction

Elasticsearch is a real-time distributed search and data analysis platform. Its popularity is due to its ease of use, powerful functions and scalability.

Elasticsearch supports RESTful operations. This means you can use HTTP methods (GET, POST, PUT, DELETE, etc.) in combination with HTTP URIs (/collection/entry) to manipulate data. The intuitive RESTful approach is both developer-friendly and user-friendly, which is one of the reasons Elasticsearch is popular.

Elasticsearch is a free and open source software with a solid company behind it-Elastic. This combination makes it suitable for everything from personal testing to enterprise integration.

This article will introduce you to Elasticsearch and show you how to install, configure and start using it.

Course Preparation

Before following this tutorial, make sure to complete the following prerequisites:

CentOS 7 Tencent CVM, students who don’t have a server can buy from here, but I personally recommend you to use the free Tencent Cloud Developer Lab for experimentation , And then [Purchase Server] (https://cloud.tencent.com/product/cvm?from=10680) after learning to install.
Non-root sudo user.

Step 1-Install Java

First of all, you need to use Java Runtime Environment (JRE) on Tencent CVM, because Elasticsearch is written in Java programming language. You can use the native CentOS OpenJDK package for JRE. This JRE is free, well supported, and automatically managed by the CentOS Yum installation manager.

You can install the latest OpenJDK using the following command:

sudo yum install java-1.8.0-openjdk.x86_64

To verify that JRE is installed and available, run the following command:

java -version

The result should look like this:

openjdk version "1.8.0_65"
OpenJDK Runtime Environment(build 1.8.0_65-b17)
OpenJDK 64-Bit Server VM(build 25.65-b01, mixed mode)

When you use Elasticsearch and start looking for better Java performance and compatibility, you can choose to install Oracle's proprietary Java (Oracle JDK 8).

Step 2-Download and install Elasticsearch

Elasticsearch can download zip, tar.gz, deb or rpm packages directly from elastic.co. For CentOS, it is best to use the native rpm package, which will install everything needed to run Elasticsearch.

At the time of writing, the latest Elasticsearch version is 1.7.3. Use the following command to download it to a directory of your choice:

wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.3.noarch.rpm

Then use the following rpm command to install it in the usual CentOS way:

sudo rpm -ivh elasticsearch-1.7.3.noarch.rpm

This resulted in installing Elasticsearch in /usr/share/elasticsearch/ and putting its configuration file in /etc/elasticsearch and adding its init script in /etc/init.d/elasticsearch.

To ensure that Elasticsearch uses Tencent CVM to automatically start and stop, use the following command to add its init script to the default runlevel:

sudo systemctl enable elasticsearch.service

Step 3-Configure Resilience

Now that Elasticsearch and its Java dependencies are installed, it is time to configure Elasticsearch.

Elasticsearch configuration files are located in the /etc/elasticsearch directory. There are two files:

elasticsearch.yml-Configure Elasticsearch server settings. This is where all the options except logging are stored, which is why we are most interested in this file.
logging.yml-provides logging configuration. First, you don't need to edit this file. You can keep all the default logging options. You can find the generated logs in /var/log/elasticsearch by default.

The first variables that can be customized on any Elasticsearch server are node.name and cluster.name in elasticsearch.yml. As the name implies, node.name specifies the name of the server (node) and the cluster associated with it.

If you do not customize these variables, node.name will be automatically assigned based on the Tencent CVM host name. At the same time, cluster.name will be automatically set as the default cluster name.

Elasticsearch's auto-discovery function will use the cluster.name value to automatically discover Elasticsearch nodes and associate them with the cluster. Therefore, if you do not change the default value, you may find unwanted nodes on the same network in the cluster.

To start editing the main elasticsearch.yml configuration file:

sudo nano /etc/elasticsearch/elasticsearch.yml

Remove the characters # at the beginning of the lines node.name and cluster.name to uncomment them, and then change their values. Your first configuration change in the /etc/elasticsearch/elasticsearch.yml file should look like this:

...
node.name:"My First Node"
cluster.name: mycluster1
...

Another important setting is the role of the server, which can be "master" or "slave". The "Master" is responsible for the health and stability of the cluster. In large deployments with a large number of cluster nodes, multiple dedicated "masters" are recommended. Usually, a dedicated "master" does not store data or create indexes. Therefore, there should be no possibility of overload, which may endanger the health of the cluster.

"Slave" is used as "workhorse" and can load data tasks. Even if the "slave" node is overloaded, it should not seriously affect the cluster health, provided that there are other nodes that require additional load.

The setting called to determine the role of the server is called node.master. If you only have one Elasticsearch node, you should comment out this option to keep it at its default value true-that is, the only node should also be the master node. Or, if you want to configure a node as a slave node, delete the character # at the beginning of the node.master line and change the value to false:

...
node.master:false...

Another important configuration option is node.data, which determines whether the node stores data. In most cases, this option should be left at its default value (true), but there are two situations where you may want to not store data on the node. One is that the node is a dedicated "master", as we have already mentioned. The other is when the node is only used to obtain data and aggregate results from the node. In the latter case, the node will act as a "search load balancer".

Similarly, if you only have one Elasticsearch node, you should comment out this setting in order to keep the default true value. Otherwise, to disable local storage of data, uncomment the following line and change the value to false:

...
node.data:false...

Two other important choices are index.number_of_shards and index.number_of_replicas. The first one determines how many (shards) the index will be divided into. The second defines the number of replicas that will be distributed in the cluster. Having more shards can improve index performance, and having more copies can increase search speed.

Assuming you are still exploring and testing Elasticsearch on a single node, it is best to start with only one shard, not a copy. Therefore, their values should be set to the following values (make sure to delete the # in the beginning line):

...
index.number_of_shards:1
index.number_of_replicas:0...

The last setting you may be interested in is path.data, change the setting to determine the data storage path. The default path is /var/lib/elasticsearch. In a production environment, it is recommended that you use dedicated partitions and mount points to store Elasticsearch data. In the best case, this dedicated partition will be a separate storage medium that provides better performance and data isolation. You can specify other path.data paths by uncommenting the path.data line and changing its value:

...
path.data:/media/different_media
...

After making all changes, save and exit the file. Now, you can start Elasticsearch for the first time with the following command:

sudo service elasticsearch start

Before you use Elasticsearch, please give it at least 10 seconds for it to fully start. Otherwise, you may receive a connection failure error.

Step 4-Protect Resilience

Elasticsearch has no built-in security and can be controlled by anyone with access to the HTTP API.

The first security adjustment is to prevent public access. To remove public access, edit the elasticsearch.yml file:

sudo nano /etc/elasticsearch/elasticsearch.yml

Find the line containing network.bind_host, uncomment it by deleting the character # at the beginning of the line, and then change the value to localhost, it will look like this:

...
network.bind_host: localhost
...

**Warning: **Since Elasticsearch does not have any built-in security, it is very important not to set this to an IP address accessible by any server that you cannot control or trust. Do not bind Elasticsearch to a public or shared private network IP address!

In addition, to increase security, you can disable dynamic scripts that are used to evaluate custom expressions. By crafting custom malicious expressions, an attacker may damage your environment.

To disable custom expressions, add the following line at the end of the /etc/elasticsearch/elasticsearch.yml file:

...
script.disable_dynamic:true...

For the above changes to take effect, you must restart Elasticsearch with the following command:

sudo service elasticsearch restart

Step 5-Testing

So far, Elasticsearch should be running on port 9200. You can use curl, a command line client URL transfer tool and a simple GET request to test it, as shown below:

curl -X GET 'http://localhost:9200'

You should see the following response:

{" status":200,"name":"CentOS Node","cluster_name":"mysqluster","version":{"number":"1.7.3","build_hash":"05d4530971ef0ea46d0f4fa6ee64dbc8df659682","build_timestamp":"2015-10-15T09:14:17Z","build_snapshot":false,"lucene_version":"4.10.4"},"tagline":"You Know, for Search"}

If you see a response similar to the above, Elasticsearch is working properly. If not, please make sure you have followed the installation instructions correctly and you have had enough time for Elasticsearch to fully start.

Step 6-Use Elasticsearch

To start using Elasticsearch, we first add some data. As mentioned earlier, Elasticsearch uses RESTful API, which responds to commonly used CRUD commands: Create, Read, Update, and Delete. To use it, we will use curl again.

You can add the first entry with the following command:

curl -X POST 'http://localhost:9200/tutorial/helloworld/1'-d '{ "message": "Hello World!" }'

You should see the following response:

{"_ index":"tutorial","_type":"helloworld","_id":"1","_version":1,"created":true}

Using curl, we sent an HTTP POST request to the Elasticseach server. The requested URI is /tutorial/helloworld/1. It is very important to understand the parameters here:

tutorial is the index of data in Elasticsearch.
helloworld is the type.
1 Is the id of our entry under the above index and type.

You can retrieve this first entry using an HTTP GET request as follows:

curl -X GET 'http://localhost:9200/tutorial/helloworld/1'

The result should look like this:

{"_ index":"tutorial","_type":"helloworld","_id":"1","_version":1,"found":true,"_source":{"message":"Hello World!"}}

To modify an existing entry, you can use an HTTP PUT request as follows:

curl -X PUT 'localhost:9200/tutorial/helloworld/1?pretty'-d '
{" message":"Hello People!"}'

Elasticsearch should admit that this is a successful modification, and it looks like this:

{"_ index":"tutorial","_type":"helloworld","_id":"1","_version":2,"created":false}

In the above example, we changed the message of the first entry to "Hello People!". In this way, the version number is automatically increased to 2.

You may have noticed the extra parameter pretty in the above request. It supports a human-readable format, so you can write each data field on a new line. You can also "beautify" your results and get better output when retrieving data, as shown below:

curl -X GET 'http://localhost:9200/tutorial/helloworld/1?pretty'

The response will now be in a better format:

{"_ index":"tutorial","_type":"helloworld","_id":"1","_version":2,"found":true,"_source":{"message":"Hello World!"}}

So far, we have added and queried data in Elasticsearch. To learn about other operations, please check API document.

in conclusion

This is how easy it is to install, configure and start using Elasticsearch. Once you have played enough manual queries, your next task is to start using it from your application.

For more CentOS tutorials, please go to [Tencent Cloud + Community] (https://cloud.tencent.com/developer?from=10680) to learn more.

Reference: "How To Install and Configure Elasticsearch on CentOS 7"