Introduction

Elasticsearch is a real-time distributed search and data analysis platform. Its popularity is due to its ease of use, powerful features and scalability.

Elasticsearch supports RESTful operations. This means you can use HTTP methods (GET, POST, PUT, DELETE, etc.) in combination with HTTP URIs (/collection/entry) to manipulate data. The intuitive RESTful approach is both developer and user-friendly, which is one of the reasons Elasticsearch is popular.

Elasticsearch is a free and open source software with a solid company behind it: Elastic. This combination makes it suitable for everything from personal testing to enterprise integration.

This article will introduce you to Elasticsearch and show you how to install, configure, secure, and start using it.

Preparation

Before following this tutorial, you need to:

Set up Ubuntu 16.04 Tencent Cloud CVM by using Ubuntu 16.04 for initial server setup, including creating a non-root user with sudo permissions.
Oracle JDK 8 is installed.

Students who don’t have a server can buy it from here, but I personally recommend you to use the free Tencent Cloud Developer Lab to experiment, and then buy server.

Unless otherwise noted, all commands in this tutorial that require root privileges should be run as a non-root user with sudo privileges.

Step 1-Download and install Elasticsearch

Elasticsearch can download packages in the form of zip, tar.gz, deb, or rpm directly from elastic.co. For Ubuntu, it is best to use the deb (Debian) package, which will install everything needed to run Elasticsearch.

First, update your package index.

sudo apt-get update

Download the latest Elasticsearch version.

wget https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/deb/elasticsearch/2.3.1/elasticsearch-2.3.1.deb

Then use dpkg to install it in the usual Ubuntu way.

sudo dpkg -i elasticsearch-2.3.1.deb

This resulted in Elasticsearch being installed in /usr/share/elasticsearch/, and its configuration file was placed in /etc/elasticsearch, and its init script was added in /etc/init.d/elasticsearch.

To ensure that Elasticsearch automatically starts and stops the server, add its init script to the default runlevel.

sudo systemctl enable elasticsearch.service

Before starting Elasticsearch for the first time, please review the next step for the recommended minimum configuration information.

Step 2-Configure Elasticsearch

Now that Elasticsearch and its Java dependencies are installed, it is time to configure Elasticsearch. Elasticsearch configuration files are located in the /etc/elasticsearch directory. There are two files:

elasticsearch.yml Configure Elasticsearch server settings. This is where all the options except logging are stored, which is why we are most interested in this file.
logging.yml provides logging configuration. First, you don't need to edit this file. You can keep all the default logging options. You can find the generated logs in /var/log/elasticsearch by default.

The first variables to customize any Elasticsearch server in elasticsearch.yml are node.name and cluster.name. As the name implies, node.name specifies the name of the server (node) and the cluster associated with it.

If you do not customize these variables, a node.name will be automatically assigned based on the Tencent Cloud CVM host name. cluster.name will be automatically set as the default cluster name.

Elasticsearch's auto-discovery function uses the value of cluster.name to automatically discover Elasticsearch nodes and associate them with the cluster. Therefore, if you do not change the default value, you may find unwanted nodes on the same network in the cluster.

Use nano or your favorite text editor to start editing the main configuration file elasticsearch.yml.

sudo nano /etc/elasticsearch/elasticsearch.yml

Remove the characters # at the beginning of the lines cluster.name and node.name to uncomment them, and then update their values. Your first configuration change in the /etc/elasticsearch/elasticsearch.yml file should look like this:

...
cluster.name: mycluster1
node.name:"My First Node"...

These are the minimum settings you can start with Elasticsearch. However, it is recommended to continue reading the configuration section for a more comprehensive understanding and fine-tuning of Elasticsearch.

A particularly important setting of Elasticsearch is the role of the server, that is, the master server or the slave server. The main server is responsible for the health and stability of the cluster. In large deployments with a large number of cluster nodes, it is recommended to use multiple dedicated master nodes. Usually, a dedicated master server does not store data or create indexes. Therefore, there should be no possibility of overload, which may endanger the health of the cluster.

The slave server* is used as a job that can load data tasks. Even if the slave node is overloaded, it should not seriously affect the cluster health, provided that there are other nodes that require additional load.

Make sure that the server role is set to node.master. By default, the node is the master node. If there is only one Elasticsearch node, you should leave this option at the default true value, because at least one master node is always required. Or, if you want to configure the node as a slave node, set the variable node.master to false, as shown below:

...
node.master:false...

Another important configuration option is node.data, which is used to determine whether the node stores data. In most cases, this option should be left at its default value (true), but there are two situations where you may want to not store data on the node. One is when the node is a dedicated master "as mentioned earlier. The other is when the node is only used to obtain data and aggregate results from the node. In the latter case, the node will act as a search load balancer .

Similarly, if you only have one Elasticsearch node, you should not change this value. Otherwise, it is forbidden to store data locally. Specify node.data as false, like this:

...
node.data:false...

In larger Elasticsearch deployments with many nodes, two other important options are index.number_of_shards and index.number_of_replicas. The first one determines how many fragments or fragments the index is divided into. The second defines the number of replicas that will be distributed in the cluster. Having more shards can improve index performance, and having more copies can increase search speed.

By default, the number of shards is 5 and the number of replicas is 1. Assuming you are still exploring and testing Elasticsearch on a single node, you can only start with one shard, and you cannot use replicas. Therefore, their values should be set as follows:

...
index.number_of_shards:1
index.number_of_replicas:0...

The last setting you might be interested in is path.data, which is used to determine the data storage path. The default path is /var/lib/elasticsearch. In a production environment, it is recommended that you use dedicated partitions and mount points to store Elasticsearch data. In the best case, this dedicated partition will be a separate storage medium that provides better performance and data isolation. You can specify a different path.data path by specifying it:

...
path.data:/media/different_media
...

After making all changes, save and exit the file. Now you can start Elasticsearch for the first time.

sudo systemctl start elasticsearch

Before trying to use Elasticsearch, please complete some Elasticsearch first. Otherwise, you may receive a connection failure error.

Step 3-Securing Elasticsearch

By default, Elasticsearch has no built-in security and can be controlled by anyone with access to the HTTP API. This may not be a security risk, because Elasticsearch only listens to the loopback interface that can only be accessed locally (ie 127.0.0.1). Therefore, as long as all server users are trusted or this is a dedicated Elasticsearch server, public access is impossible, and your Elasticsearch is sufficiently secure.

However, if you want to strengthen security, the first thing to do is to enable authentication. Identity verification is provided by the commercial [Shield plug-in] (https://www.elastic.co/downloads/shield). This plugin is not free, but there is a free 30-day trial version to test it. Its official page has excellent installation and configuration instructions. The only thing you may need to know is that the path of the Elasticsearch plugin installation manager is /usr/share/elasticsearch/bin/plugin.

If you don't want to use commercial plugins but still need to allow remote access to the HTTP API, you can at least use Ubuntu's default firewall UFW (Simple Firewall) to restrict network exposure. By default, UFW is installed but not enabled. If you decide to use it, follow these steps:

First, create a rule that allows any required services. You need at least SSH to log in to the server. To allow global access to SSH, please whitelist port 22.

sudo ufw allow 22

Then allow access to the default Elasticsearch HTTP API port (TCP 9200) of the trusted remote host, such as TRUSTED_IP, as shown below:

sudo ufw allow from TRUSTED_IP to any port 9200

Only after that enable the UFW command:

sudo ufw enable

Finally, use the following command to check the status of UFW:

sudo ufw status

If you have specified the rules correctly, the output should look like this:

Status: active

To                         Action      From
- - - - - - - - - - - - 9200      ALLOW       TRUSTED_IP
22       ALLOW       Anywhere
22( v6)                    ALLOW       Anywhere(v6)

Once you confirm that UFW is enabled and protects Elasticsearch port 9200, you can allow Elasticsearch to listen for external connections. To do this, open the configuration file elasticsearch.yml again.

sudo nano /etc/elasticsearch/elasticsearch.yml

Find the line containing network.bind_host, uncomment it by deleting the characters at the beginning of the # line, and then change the value to 0.0.0.0 as shown below:

...
network.host:0.0.0.0...

We have specified 0.0.0.0 so that Elasticsearch listens on all interfaces and bound IPs. If you want it to listen only on a specific interface, you can specify its IP instead of 0.0.0.0.

To make the above settings take effect, restart Elasticsearch with the following command:

sudo systemctl restart elasticsearch

Then try to connect to Elasticsearch from a trusted host. If you cannot connect, make sure that UFW is running and the variable network.host is correctly specified.

Step 4-Testing Elasticsearch

So far, Elasticsearch should be running on port 9200. You can use the curl command line client URL transfer tool and a simple GET request to test it.

curl -X GET 'http://localhost:9200'

You should see the following response:

{" name":"My First Node","cluster_name":"mycluster1","version":{"number":"2.3.1","build_hash":"bd980929010aef404e7cb0843e61d0665269fc39","build_timestamp":"2016-04-04T12:25:05Z","build_snapshot":false,"lucene_version":"5.5.0"},"tagline":"You Know, for Search"}

If you see a response similar to the above, Elasticsearch is working properly. If not, please make sure you have followed the installation instructions correctly and you have had enough time for Elasticsearch to fully start.

To perform a more thorough check on Elasticsearch, execute the following command:

curl -XGET 'http://localhost:9200/_nodes?pretty'

In the output of the above command, you can view and verify all current settings of nodes, clusters, application paths, modules, etc.

Step 5-Use Elasticsearch

To start using Elasticsearch, we first add some data. As mentioned earlier, Elasticsearch uses a RESTful API, which responds to the usual CRUD commands: create, read, update, and delete. In order to use it, we will use curl again.

You can add the first entry with the following command:

curl -X POST 'http://localhost:9200/tutorial/helloworld/1'-d '{ "message": "Hello World!" }'

You should see the following response:

{"_ index":"tutorial","_type":"helloworld","_id":"1","_version":1,"_shards":{"total":2,"successful":1,"failed":0},"created":true}

Through cuel, we have sent an HTTP POST request to the Elasticsearch server. The requested URI /tutorial/helloworld/1 has several parameters:

tutorial is the index of data in Elasticsearch.
helloworld is the type.
1 Is the id of our entry under the above index and type.

You can retrieve this first entry using an HTTP GET request.

curl -X GET 'http://localhost:9200/tutorial/helloworld/1'

The result should look like this:

{"_ index":"tutorial","_type":"helloworld","_id":"1","_version":1,"found":true,"_source":{"message":"Hello World!"}}

To modify an existing entry, you can use an HTTP PUT request.

curl -X PUT 'localhost:9200/tutorial/helloworld/1?pretty'-d '
{" message":"Hello People!"}'

Elasticsearch should recognize the successful modification as follows:

{"_ index":"tutorial","_type":"helloworld","_id":"1","_version":2,"_shards":{"total":2,"successful":1,"failed":0},"created":false}

In the above example, we changed the message of the first entry to "Hello People!". In this way, the version number is automatically increased to 2.

You may have noticed the extra parameter pretty in the above request. It supports a readable format, so you can write each data field on a new line. You can also "beautify" your results and get better output when retrieving data, as shown below:

curl -X GET 'http://localhost:9200/tutorial/helloworld/1?pretty'

The response will now be in a better format:

{"_ index":"tutorial","_type":"helloworld","_id":"1","_version":2,"found":true,"_source":{"message":"Hello People!"}}

So far, we have added and queried data in Elasticsearch.

in conclusion

This is all the operations to install, configure and start using Elasticsearch. Once you have played enough manual queries, your next task is to start using it from your application.

To learn more about installing and configuring Elasticsearch related tutorials, please go to [Tencent Cloud + Community] (https://cloud.tencent.com/developer?from=10680) to learn more.

Reference: "How To Install and Configure Elasticsearch on Ubuntu 16.04"

How to install and configure Elasticsearch on Ubuntu 16.04