Introduction

Elasticsearch is a popular open source search server for real-time distributed search and data analysis. When used for any tasks other than development, Elasticsearch should be deployed as a cluster across multiple servers for optimal performance, stability, and scalability.

This tutorial will show you how to install and configure a production Elasticsearch cluster on Ubuntu 14.04 in the cloud server environment.

Although manually setting up an Elasticsearch cluster is useful for learning, it is strongly recommended to use configuration management tools in any cluster setup.

prerequisites

You must have at least three Ubuntu 14.04 servers to complete this tutorial, because the Elasticsearch cluster should have at least 3 nodes that qualify as the master node. If you want to have a dedicated master node and data node, the master node needs at least 3 servers, and the data node needs additional servers. Students who don’t have a server can buy it from here, but I personally recommend you to use the free Tencent Cloud Developer Lab for experimentation, and then buy server.

If you prefer to use CentOS, please check this tutorial: How to set up a production Elasticsearch cluster on CentOS 7

Assumption

This tutorial assumes that your server is using a V** network, no matter what kind of physical network your server uses, this will provide a dedicated network function.

If you are using a shared private network, you must use V** to protect Elasticsearch from unauthorized access. Each server must be on the same private network, because Elasticsearch has no built-in security in its HTTP interface. Do not share a private network with any computer you do not trust.

We refer to the V** IP address of your server as V**_ip. We also assume that they both have a V** interface named "tun0" as described in the tutorial linked above.

Install Java 8

Elasticsearch requires Java, so we will install it now. We will install the latest version of Oracle Java 8, as this is recommended by Elasticsearch. However, if you decide to go this route, it should work with OpenJDK.

Complete this step on all Elasticsearch servers. :

sudo add-apt-repository -y ppa:webupd8team/java

Update your apt package database:

sudo apt-get update

Use this command to install the latest stable version of Oracle Java 8 (and accept the pop-up license agreement):

sudo apt-get-y install oracle-java8-installer

Be sure to repeat this step on all Elasticsearch servers.

Now that Java 8 is installed, let's install ElasticSearch.

Install Elasticsearch

By adding Elastic's package source list, Elasticsearch can be installed with the package manager. Complete this step on all Elasticsearch servers.

Run the following command to import the Elasticsearch public GPG key into apt:

wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

If your prompt just hangs there, it may be waiting for your user password (authorize the sudo command). If this is the case, please enter your password.

echo "deb http://packages.elastic.co/elasticsearch/2.x/debian stable main"| sudo tee -a /etc/apt/sources.list.d/elasticsearch-2.x.list

Update your apt package database:

sudo apt-get update

Install Elasticsearch using the following command:

sudo apt-get-y install elasticsearch

Be sure to repeat these steps on all Elasticsearch servers.

Elasticsearch is now installed, but you need to configure it before you can use it.

Configure Elasticsearch cluster

Now it's time to edit the Elasticsearch configuration. Complete these steps on all Elasticsearch servers.

Open the Elasticsearch configuration file for editing:

sudo vi /etc/elasticsearch/elasticsearch.yml

The subsequent sections will explain how to modify the configuration.

Bind to V IP address or interface**

You need to restrict external access to the Elasticsearch instance, so outsiders cannot access your data or shut down your Elasticsearch cluster via HTTP API. In other words, you must configure Elasticsearch so that it only allows access to servers on your private network (V**). For this, we need to configure each node to bind to the V** IP address V**_ip, or the interface "tun0".

Find the specified network.host line, uncomment it, and replace its value with the corresponding server's V** IP address (for example, replace node01 with 10.0.0.1) or interface name. Since our V** interface is named "tun0" on all servers, we can configure all servers with the same line:

network.host:[_tun0_, _local_]

Please note that adding "local" will configure Elasticsearch to also listen to all loopback devices. This will allow you to use the Elasticsearch HTTP API locally by sending a request to localhost from each server. If you do not include this item, Elasticsearch will only respond to requests for V** IP addresses.

**Warning: **Since Elasticsearch does not have any built-in security, it is very important not to set this to an IP address accessible by any server that you cannot control or trust. Do not bind Elasticsearch to a public or shared private network IP address!

Set cluster name

Next, set the name of the cluster, which will allow your Elasticsearch nodes to join and form a cluster. You will need to use a unique and descriptive name (in your network).

Find the specified cluster.name line, uncomment it, and replace its value with the desired cluster name. In this tutorial, we will name our cluster "production":

cluster.name: production

Set node name

Next, we will set the name of each node. This should be a descriptive name that is unique in the cluster.

Find the specified node.name line, uncomment it, and replace its value with the desired node name. In this tutorial, we will use the ${HOSTNAME} environment variable to set each node name to the server's hostname:

node.name: ${HOSTNAME}

If you want, you can name the node manually, but make sure to specify a unique name. If you don't mind naming your nodes randomly, you can also comment out node.name.

Set discovery host

Next, you need to configure an initial list of nodes, which will be contacted to discover and form a cluster. This is necessary in unicast networks.

Find the specified and uncommented discovery.zen.ping.unicast.hosts line. Replace its value with a string array of V** IP addresses or host names (resolved to V** IP addresses) of all other nodes.

For example, if you have three servers node01, node02 and node03 and their respective V** IP addresses 10.0.0.1, 10.0.0.2 and 10.0.0.3, you can use this line:

discovery.zen.ping.unicast.hosts:["10.0.0.1","10.0.0.2","10.0.0.3"]

Or, if all your servers are configured with name-based V** IP address resolution (via DNS or /etc/hosts), you can use the following line:

discovery.zen.ping.unicast.hosts:["node01","node02","node03"]

Note: Ansible Playbook will automatically create an entry of /etc/hosts on each server, and resolve the stock hostname of each V server (specified in the Ansible hosts file) to it V IP address.

Save and exit

Your server is now configured to form a basic Elasticsearch cluster. You need to update more settings, but we will see these settings after we verify that the cluster is working properly.

Save and exit elasticsearch.yml.

Start Elasticsearch

Now start Elasticsearch:

sudo service elasticsearch restart

Then run this command to start Elasticsearch on boot:

sudo update-rc.d elasticsearch defaults 9510

Be sure to repeat these steps (Configure Elasticsearch Cluster) on all Elasticsearch servers.

Check cluster status

If everything is configured correctly, your Elasticsearch cluster should be up and running. Before proceeding, let us verify that it is working properly. You can do this by querying Elasticsearch from any Elasticsearch node.

From any Elasticsearch server, run this command to print the status of the cluster:

curl -XGET 'http://localhost:9200/_cluster/state?pretty'

You should see output indicating that the cluster named "production" is running. It should also indicate that all nodes you configure are members:

{" cluster_name":"production","version":36,"state_uuid":"MIkS5sk7TQCl31beb45kfQ","master_node":"k6k2UObVQ0S-IFoRLmDcvA","blocks":{},"nodes":{"Jx_YC2sTQY6ayACU43_i3Q":{"name":"node02","transport_address":"10.0.0.2:9300","attributes":{}},"k6k2UObVQ0S-IFoRLmDcvA":{"name":"node01","transport_address":"10.0.0.1:9300","attributes":{}},"kQgZZUXATkSpduZxNwHfYQ":{"name":"node03","transport_address":"10.0.0.3:9300","attributes":{}}},...

If you see output similar to this, your Elasticsearch cluster is running! If any nodes are missing, check the configuration of the relevant nodes before continuing.

Next, we will introduce some configuration settings that you should consider for your Elasticsearch cluster.

Enable memory lock

Elastic recommends avoiding swapping Elasticsearch processes at all costs as it will negatively affect performance and stability. One way to avoid excessive swapping is to configure Elasticsearch to lock the memory it needs.

Complete this step on all Elasticsearch servers.

Edit Elasticsearch configuration:

sudo vi /etc/elasticsearch/elasticsearch.yml

Find the specified and uncommented bootstrap.mlockall line:

bootstrap.mlockall:true

Save and exit.

Next, open the /etc/default/elasticsearch file for editing:

sudo vi /etc/default/elasticsearch

First, look for ES_HEAP_SIZE, uncomment it, and set it to about 50% of available memory. For example, if you have about 4 GB of free space, you should set it to 2 GB (2g):

ES_HEAP_SIZE=2g

Next, find and uncomment MAX_LOCKED_MEMORY=unlimited. It should look like this when you are done:

MAX_LOCKED_MEMORY=unlimited

Save and exit.

Now restart Elasticsearch to put the changes in place:

sudo service elasticsearch restart

Be sure to repeat this step on all Elasticsearch servers.

Verify Mlockall status

To verify if mlockall is running on all Elasticsearch nodes, run this command from any node:

curl http://localhost:9200/_nodes/process?pretty

Each node should have a line stating "mlockall": true, indicating that memory locking is enabled and working properly:

..." nodes":{"kQgZZUXATkSpduZxNwHfYQ":{"name":"es03","transport_address":"10.0.0.3:9300","host":"10.0.0.3","ip":"10.0.0.3","version":"2.2.0","build":"8ff36d1","http_address":"10.0.0.3:9200","process":{"refresh_interval_in_millis":1000,"id":1650,"mlockall":true}...

If mlockall is false on any node, please check the node's settings and restart Elasticsearch. The common reason why Elasticsearch cannot start is that ES_HEAP_SIZE is set too high.

Configure open file descriptor limit (optional)

By default, your Elasticsearch node should have an "open file descriptor limit" of 64k. This section will show you how to verify this, and you can add it if you want.

How to verify the largest open file

First, find the process ID (PID) of the Elasticsearch process. A simple way is to use the ps command to list all processes belonging to the elasticsearch user:

ps -u elasticsearch

You should see output that looks like this. The number in the first column is the PID of the Elasticsearch (java) process:

 PID TTY          TIME CMD
11708?00:00:10 java

Then run this command to display the open file limit of the Elasticsearch process (replace the highlighted number with your own PID from the previous step):

cat /proc/11708/limits | grep 'Max open files'
Max open files            6553565535                files

The numbers in the second and third columns represent the soft limit and the hard limit respectively, which are 64k (65535). This is fine for many settings, but you may want to increase this setting.

How to increase the maximum file descriptor limit

To increase the maximum number of open file descriptors in Elasticsearch, you only need to change a single setting.

Open the /etc/default/elasticsearch file for editing:

sudo vi /etc/default/elasticsearch

Find MAX_OPEN_FILES, uncomment it, and set it to the limit you want. For example, if you want to limit 128k descriptors, change it to 131070:

MAX_OPEN_FILES=131070

Save and exit.

Now restart Elasticsearch to put the changes in place:

sudo service elasticsearch restart

Then follow the instructions in the previous section to verify that the limit has been increased.

Be sure to repeat this step on any Elasticsearch server that requires a higher file descriptor limit.

Configure dedicated master node and data node (optional)

There are two common types of Elasticsearch nodes: master and data. The master node performs cluster-wide operations such as managing indexes and determining which data nodes should store specific data slices. The data node saves the fragments of the index document, and handles CRUD, search and aggregation operations. As a general rule, data nodes consume a lot of CPU, memory and I/O.

By default, each Elasticsearch node is configured as a data node that "fits the master node", which means that they store data (and perform resource-intensive operations) and may be selected as the master node. For a small cluster, this is usually fine; however, a large Elasticsearch cluster should be configured with a dedicated master node so that the stability of the master node will not be affected by the work of dense data nodes.

How to configure a dedicated master node

Before configuring a dedicated master node, make sure that your cluster has at least 3 nodes that match the master node. This is very important to avoid split-brain situations, which can cause data inconsistencies when the network fails.

To configure a dedicated master node, edit the Elasticsearch configuration of the node:

sudo vi /etc/elasticsearch/elasticsearch.yml

Add the following two lines:

node.master:true 
node.data:false

The first line node.master: true is used to specify that the node matches the master node, which is actually the default setting. The second line node.data: false will restrict the node to become a data node.

Save and exit.

Now restart the Elasticsearch node for the changes to take effect:

sudo service elasticsearch restart

Be sure to repeat this step on other dedicated master nodes.

You can use the following command to query the cluster to see which nodes are configured as dedicated master nodes: curl -XGET'http://localhost:9200/_cluster/state?pretty'. Any node with data: false and master: true is a dedicated master node.

How to configure dedicated data nodes

To configure a dedicated data node-a data node that does not meet the main conditions-edit the Elasticsearch configuration of the node:

sudo vi /etc/elasticsearch/elasticsearch.yml

Add the following two lines:

node.master:false 
node.data:true

The first line node.master: false specifies that the node does not match the master node. The second line node.data: true is the default setting, allowing nodes to be used as data nodes.

Save and exit.

Now restart the Elasticsearch node for the changes to take effect:

sudo service elasticsearch restart

Be sure to repeat this step on other dedicated data nodes.

You can use the following command to query the cluster to see which nodes are configured as dedicated data nodes: curl -XGET'http://localhost:9200/_cluster/state?pretty'. List master: false and any node **not listed data: false` are dedicated data nodes.

Configure the smallest master node

When running an Elasticsearch cluster, it must be set to the minimum number of nodes that the cluster needs to run in line with the master node for normal operation. This is sometimes called arbitration. This is to ensure data consistency in the event that one or more nodes lose connection with the rest of the cluster, thereby preventing the so-called "split brain" situation.

To calculate the minimum number of primary nodes that a cluster should have, calculate n / 2 + 1, where n is the total number of "primarily eligible" nodes in a healthy cluster, and then round the result down to the nearest integer . For example, for a 3-node cluster, the quorum is 2.

**Note: **Make sure to include all eligible nodes in the arbitration calculation, including any data nodes that meet the primary conditions (default setting).

The minimum master node setting can be dynamically set through the Elasticsearch HTTP API. To do this, run this command on any node (replace the highlighted number with your quorum):

curl -XPUT localhost:9200/_cluster/settings?pretty -d '{"persistent":{"discovery.zen.minimum_master_nodes":2}}'
Output:{"acknowledged":true,"persistent":{"discovery":{"zen":{"minimum_master_nodes":"2"}}},"transient":{}}

**Note: **This command is a "persistent" setting, which means that the minimum master node setting will continue to exist after a full cluster restart and overwrite the Elasticsearch configuration file. In addition, this setting can be specified as discovery.zen.minimum_master_nodes: 2 in /etc/elasticsearch.yml, if you have not set dynamics yet.

If you want to check this setting later, you can run the following command:

curl -XGET localhost:9200/_cluster/settings?pretty

How to access Elasticsearch

You can access the Elasticsearch HTTP API by sending a request to the V** IP address of any node, or as shown in the tutorial, by sending a request from one of the nodes to localhost to access the Elasticsearch HTTP API.

The client server can access your Elasticsearch cluster through the V** IP address of any node, which means that the client server must also be part of the V**.

If you have other software that needs to connect to the cluster (such as Kibana or Logstash), you can usually configure the connection by providing the application with the V** IP addresses of one or more Elasticsearch nodes.

in conclusion

Your Elasticsearch cluster should be running in a healthy state and configured with some basic optimizations!

Elasticsearch has many other configuration options not covered here, such as indexing, sharding, and replication settings. It is recommended that you revisit the configuration and official documentation later to ensure that your cluster configuration meets your needs.

For more Ubuntu tutorials, please go to [Tencent Cloud + Community] (https://cloud.tencent.com/developer?from=10680) to learn more.

Reference: "How To Set Up a Production Elasticsearch Cluster on Ubuntu 14.04"

How to set up a production Elasticsearch cluster on Ubuntu 14.04

Introduction

prerequisites

**Assumption **

Install Java 8

Install Elasticsearch

Configure Elasticsearch cluster

Bind to V IP address or interface**

Set cluster name

Set node name

Set discovery host

Save and exit

Start Elasticsearch

Check cluster status

Enable memory lock

Verify Mlockall status

Configure open file descriptor limit (optional)

How to verify the largest open file

How to increase the maximum file descriptor limit

Configure dedicated master node and data node (optional)

How to configure a dedicated master node

How to configure dedicated data nodes

Configure the smallest master node

How to access Elasticsearch

in conclusion

Assumption