Elasticsearch is a real-time distributed search and data analysis platform. Its popularity is due to its ease of use, powerful functions and scalability.
Elasticsearch supports RESTful operations. This means you can use HTTP methods (GET, POST, PUT, DELETE, etc.) in combination with HTTP URIs (/collection/entry) to manipulate data. The intuitive RESTful approach is both developer-friendly and user-friendly, which is one of the reasons Elasticsearch is popular.
Elasticsearch is a free and open source software with a solid company behind it-Elastic. This combination makes it suitable for everything from personal testing to enterprise integration.
This article will introduce you to Elasticsearch and show you how to install, configure and start using it.
Before following this tutorial, make sure to complete the following prerequisites:
First of all, you need to use Java Runtime Environment (JRE) on Tencent CVM, because Elasticsearch is written in Java programming language. You can use the native CentOS OpenJDK package for JRE. This JRE is free, well supported, and automatically managed by the CentOS Yum installation manager.
You can install the latest OpenJDK using the following command:
sudo yum install java-1.8.0-openjdk.x86_64
To verify that JRE is installed and available, run the following command:
java -version
The result should look like this:
openjdk version "1.8.0_65"
OpenJDK Runtime Environment(build 1.8.0_65-b17)
OpenJDK 64-Bit Server VM(build 25.65-b01, mixed mode)
When you use Elasticsearch and start looking for better Java performance and compatibility, you can choose to install Oracle's proprietary Java (Oracle JDK 8).
Elasticsearch can download zip, tar.gz, deb or rpm packages directly from elastic.co. For CentOS, it is best to use the native rpm package, which will install everything needed to run Elasticsearch.
At the time of writing, the latest Elasticsearch version is 1.7.3. Use the following command to download it to a directory of your choice:
wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.3.noarch.rpm
Then use the following rpm
command to install it in the usual CentOS way:
sudo rpm -ivh elasticsearch-1.7.3.noarch.rpm
This resulted in installing Elasticsearch in /usr/share/elasticsearch/
and putting its configuration file in /etc/elasticsearch
and adding its init script in /etc/init.d/elasticsearch
.
To ensure that Elasticsearch uses Tencent CVM to automatically start and stop, use the following command to add its init script to the default runlevel:
sudo systemctl enable elasticsearch.service
Now that Elasticsearch and its Java dependencies are installed, it is time to configure Elasticsearch.
Elasticsearch configuration files are located in the /etc/elasticsearch
directory. There are two files:
elasticsearch.yml
-Configure Elasticsearch server settings. This is where all the options except logging are stored, which is why we are most interested in this file. logging.yml
-provides logging configuration. First, you don't need to edit this file. You can keep all the default logging options. You can find the generated logs in /var/log/elasticsearch
by default.The first variables that can be customized on any Elasticsearch server are node.name
and cluster.name
in elasticsearch.yml
. As the name implies, node.name
specifies the name of the server (node) and the cluster associated with it.
If you do not customize these variables, node.name
will be automatically assigned based on the Tencent CVM host name. At the same time, cluster.name
will be automatically set as the default cluster name.
Elasticsearch's auto-discovery function will use the cluster.name
value to automatically discover Elasticsearch nodes and associate them with the cluster. Therefore, if you do not change the default value, you may find unwanted nodes on the same network in the cluster.
To start editing the main elasticsearch.yml
configuration file:
sudo nano /etc/elasticsearch/elasticsearch.yml
Remove the characters #
at the beginning of the lines node.name
and cluster.name
to uncomment them, and then change their values. Your first configuration change in the /etc/elasticsearch/elasticsearch.yml
file should look like this:
...
node.name:"My First Node"
cluster.name: mycluster1
...
Another important setting is the role of the server, which can be "master" or "slave". The "Master" is responsible for the health and stability of the cluster. In large deployments with a large number of cluster nodes, multiple dedicated "masters" are recommended. Usually, a dedicated "master" does not store data or create indexes. Therefore, there should be no possibility of overload, which may endanger the health of the cluster.
"Slave" is used as "workhorse" and can load data tasks. Even if the "slave" node is overloaded, it should not seriously affect the cluster health, provided that there are other nodes that require additional load.
The setting called to determine the role of the server is called node.master
. If you only have one Elasticsearch node, you should comment out this option to keep it at its default value true
-that is, the only node should also be the master node. Or, if you want to configure a node as a slave node, delete the character #
at the beginning of the node.master
line and change the value to false
:
...
node.master:false...
Another important configuration option is node.data
, which determines whether the node stores data. In most cases, this option should be left at its default value (true
), but there are two situations where you may want to not store data on the node. One is that the node is a dedicated "master", as we have already mentioned. The other is when the node is only used to obtain data and aggregate results from the node. In the latter case, the node will act as a "search load balancer".
Similarly, if you only have one Elasticsearch node, you should comment out this setting in order to keep the default true
value. Otherwise, to disable local storage of data, uncomment the following line and change the value to false
:
...
node.data:false...
Two other important choices are index.number_of_shards
and index.number_of_replicas
. The first one determines how many (shards) the index will be divided into. The second defines the number of replicas that will be distributed in the cluster. Having more shards can improve index performance, and having more copies can increase search speed.
Assuming you are still exploring and testing Elasticsearch on a single node, it is best to start with only one shard, not a copy. Therefore, their values should be set to the following values (make sure to delete the #
in the beginning line):
...
index.number_of_shards:1
index.number_of_replicas:0...
The last setting you may be interested in is path.data
, change the setting to determine the data storage path. The default path is /var/lib/elasticsearch
. In a production environment, it is recommended that you use dedicated partitions and mount points to store Elasticsearch data. In the best case, this dedicated partition will be a separate storage medium that provides better performance and data isolation. You can specify other path.data
paths by uncommenting the path.data
line and changing its value:
...
path.data:/media/different_media
...
After making all changes, save and exit the file. Now, you can start Elasticsearch for the first time with the following command:
sudo service elasticsearch start
Before you use Elasticsearch, please give it at least 10 seconds for it to fully start. Otherwise, you may receive a connection failure error.
Elasticsearch has no built-in security and can be controlled by anyone with access to the HTTP API.
The first security adjustment is to prevent public access. To remove public access, edit the elasticsearch.yml
file:
sudo nano /etc/elasticsearch/elasticsearch.yml
Find the line containing network.bind_host
, uncomment it by deleting the character #
at the beginning of the line, and then change the value to localhost
, it will look like this:
...
network.bind_host: localhost
...
**Warning: **Since Elasticsearch does not have any built-in security, it is very important not to set this to an IP address accessible by any server that you cannot control or trust. Do not bind Elasticsearch to a public or shared private network IP address!
In addition, to increase security, you can disable dynamic scripts that are used to evaluate custom expressions. By crafting custom malicious expressions, an attacker may damage your environment.
To disable custom expressions, add the following line at the end of the /etc/elasticsearch/elasticsearch.yml
file:
...
script.disable_dynamic:true...
For the above changes to take effect, you must restart Elasticsearch with the following command:
sudo service elasticsearch restart
So far, Elasticsearch should be running on port 9200. You can use curl, a command line client URL transfer tool and a simple GET request to test it, as shown below:
curl -X GET 'http://localhost:9200'
You should see the following response:
{" status":200,"name":"CentOS Node","cluster_name":"mysqluster","version":{"number":"1.7.3","build_hash":"05d4530971ef0ea46d0f4fa6ee64dbc8df659682","build_timestamp":"2015-10-15T09:14:17Z","build_snapshot":false,"lucene_version":"4.10.4"},"tagline":"You Know, for Search"}
If you see a response similar to the above, Elasticsearch is working properly. If not, please make sure you have followed the installation instructions correctly and you have had enough time for Elasticsearch to fully start.
To start using Elasticsearch, we first add some data. As mentioned earlier, Elasticsearch uses RESTful API, which responds to commonly used CRUD commands: Create, Read, Update, and Delete. To use it, we will use curl again.
You can add the first entry with the following command:
curl -X POST 'http://localhost:9200/tutorial/helloworld/1'-d '{ "message": "Hello World!" }'
You should see the following response:
{"_ index":"tutorial","_type":"helloworld","_id":"1","_version":1,"created":true}
Using curl, we sent an HTTP POST request to the Elasticseach server. The requested URI is /tutorial/helloworld/1
. It is very important to understand the parameters here:
tutorial
is the index of data in Elasticsearch. helloworld
is the type.1
Is the id of our entry under the above index and type.You can retrieve this first entry using an HTTP GET request as follows:
curl -X GET 'http://localhost:9200/tutorial/helloworld/1'
The result should look like this:
{"_ index":"tutorial","_type":"helloworld","_id":"1","_version":1,"found":true,"_source":{"message":"Hello World!"}}
To modify an existing entry, you can use an HTTP PUT request as follows:
curl -X PUT 'localhost:9200/tutorial/helloworld/1?pretty'-d '
{" message":"Hello People!"}'
Elasticsearch should admit that this is a successful modification, and it looks like this:
{"_ index":"tutorial","_type":"helloworld","_id":"1","_version":2,"created":false}
In the above example, we changed the message
of the first entry to "Hello People!". In this way, the version number is automatically increased to 2
.
You may have noticed the extra parameter pretty
in the above request. It supports a human-readable format, so you can write each data field on a new line. You can also "beautify" your results and get better output when retrieving data, as shown below:
curl -X GET 'http://localhost:9200/tutorial/helloworld/1?pretty'
The response will now be in a better format:
{"_ index":"tutorial","_type":"helloworld","_id":"1","_version":2,"found":true,"_source":{"message":"Hello World!"}}
So far, we have added and queried data in Elasticsearch. To learn about other operations, please check API document.
This is how easy it is to install, configure and start using Elasticsearch. Once you have played enough manual queries, your next task is to start using it from your application.
For more CentOS tutorials, please go to [Tencent Cloud + Community] (https://cloud.tencent.com/developer?from=10680) to learn more.
Reference: "How To Install and Configure Elasticsearch on CentOS 7"
Recommended Posts