Install Hadoop cluster on Linux (CentOS7+hadoop-2.8.0)

1 Download hadoop

The hadoop used in this blog post is 2.8.0

Open the download address selection page:

http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.8.0/hadoop-2.8.0.tar.gz

As shown:

The address I use is:

http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.8.0/hadoop-2.8.0.tar.gz

2 Install 3 virtual machines and realize SSH password-free login#

2.1 Install 3 machines##

The Linux system used here is CentOS7 (in fact, Ubuntu is also very good, but here is the CentOS7 demo), the installation method is not much to say, if necessary, please refer to this blog post:

http://blog.csdn.net/pucao_cug/article/details/71229416

Install 3 machines, and the machine names are hserver1, hserver2, hserver3 (indicating that the machine name is not called this way, and you can use the hostname command to modify it later).

As shown:

Note: In order to avoid the trouble of the following series of authorization, the root account is used to log in and operate directly.

Use the ifconfig command to view the IP of these 3 machines. The correspondence between my machine name and ip is:

192.168.119.128    hserver1

192.168.119.129    hserver2

192.168.119.130    hserver3

2.2 Check the machine name##

For the convenience of subsequent operations, ensure that the hostname of the machine is what we want. Take the machine 192.168.119.128 as an example, log in with the root account, and then use the hostname command to view the machine name

As shown:

Found that the machine name is not what we want. But this is easy to handle. I will change its name. The command is:

hostname   hserver1

As shown:

After the execution is complete, check to see if it has been modified, and type the hostname command:

As shown:

Similarly, rename the other two machines to hserver2 and hserver3 respectively.

2.3 Modify the /etc/hosts file##

Modify the /etc/hosts file of these 3 machines and add the following content to the file:

[ plain]view plaincopy

  1. 192.168.119.128    hserver1
  2. 192.168.119.129    hserver2
  3. 192.168.119.130    hserver3

As shown:

Note: The IP address does not need to be the same as mine, here is just a mapping, as long as the mapping is correct, as for the modification method, you can use the vim command, or you can write the contents of the hosts file on your local machine. Get the Linux machine to cover.

After the configuration is complete, use the ping command to check whether the three machines can ping each other. Taking hserver1 as an example, where to execute the command:

ping  -c 3  hserver2

As shown:

Excuting an order:

ping  -c  3  hserver3

As shown:

If the ping works, it means that the machines are interconnected and the hosts configuration is correct.

2.4 Generate secret key files for 3 machines##

Take hserve1 as an example, execute the command to generate a secret key with an empty string (the public key will be used later), the command is:

ssh-keygen  -t   rsa   -P  ''

As shown:

Because I am using the root account now, the secret key file is saved in the /root/.ssh/ directory, and you can use the command to view it. The command is:

ls    /root/.ssh/

As shown:

Use the same method to generate secret keys for hserver2 and hserver3 (the command is exactly the same, no need to modify it).

2.5 Create authorized_keys file on hserver1##

The next thing to do is to save a file with the same content in the /root/.ssh/ directory of the three machines. The file name is authorized_keys. The content of the file is the public key we just generated for the three machines. For convenience, my next step is to generate an authorized_keys file on hserver1, and then add the public keys generated by the three machines to the authorized_keys file of hserver1, and then copy the authorized_keys file to hserver2 and hserver3.

First use the command to generate a file named authorized_keys in the /root/.ssh/ directory of hserver1. The command is:

touch  /root/.ssh/authorized_keys

As shown:

You can use the command to see if the generation is successful, the command is:

ls   /root/.ssh/

As shown:

Then copy the contents of the /root/.ssh/id_rsa.pub file on hserver1, the contents of the /root/.ssh/id_rsa.pub file on hserver2, and the contents of the /root/.ssh/id_rsa.pub file on hserver3 to this authorized_keys. In the file, there are many ways to copy, you can use the cat command and the vim command to get it, or you can directly download the /root/.ssh/id_rsa.pub file on the three machines to the local, and edit the authorized_keys file locally Fortunately, upload to these 3 machines.

The content of my /root/.ssh/id_rsa.pub on the hserver1 machine is:

[ plain]view plaincopy

  1. ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQD8fTIVorOxgDo81yCEgcJTstUcyfOBecL+NZ/OLXCEzaBMw5pLV0UNRX6SZnaAgu/erazkz4sw74zfRIMzEeKKCeNcZ6W78cg+ZNxDcj8+FGeYqY5+nc0YPhXFVI7AwFmfr7fH5hoIT14ClKfGklPgpEgUjDth0PeRwnUTvUy9A1x76npjAZrknQsnoLYle7cVJZ/zO3eGxS75YEdTYDMv+UMiwtcJg7UxOqR+9UT3TO+xLk0yOl8GIISXzMhdCZkmyAH+DmW56ejzsd+JWwCMm177DtOZULl7Osq+OGOtpbloj4HCfstpoiG58SM6Nba8WUXWLnbgqZuHPBag/Kqjroot@hserver1

The content of my /root/.ssh/id_rsa.pub on the hserver2 machine is:

[ plain]view plaincopy

  1. ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC29kPkYz4c3bd9Qa1TV8kCR0bUNs4f7/dDcR1NKwrgIiecN7zPEWJpjILtlm3niNNx1j5R49QLTLBKKo8PE8mid47POvNypkVRGDeN2IVCivoAQ1T7S8bTJ4zDECGydFYyKQfS2nOAifAWECdgFFtIp52d+dLIAg1JC37pfER9f32rd7anhTHYKwnLwR/NDVGAw3tMkXOnFuFKUMdOJ3GSoVOZf3QHKykGIC2fz/lsXZHaCcQWvOU/Ecd9e0263Tvqh7zGWpF5WYEGjkLlY8v2sioeZxgzog1LWycUTMTqaO+fSdbvKqVj6W0qdy3Io8bJ29Q3S/6MxLa6xvFcBJEXroot@hserver2

The content of my /root/.ssh/id_rsa.pub on the hserver2 machine is:

[ plain]view plaincopy

  1. ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC1a2o10ttv2570GpuUZy7g9o7lIkkeed7ba25VvFEBcUroQIZ+NIAiVIMGPRiOqm7X4bTLWj5EOz5JXG2l8rwA6CFnWfW3U+ttD1COLOrv2tHTiJ1PhQy1jJR/LpC1iX3sNIDDs+I0txZFGTCTRMLmrbHVTl8j5Yy/CTYLuC7reIZjzpHP7aaS2ev0dlbQzeB08ncjA5Jh4X72qQMOGPUUc2C9oa/CeCvI0SJbt8mkHwqFanZz/IfhLJIKhupjtYsqwQMmzLIjHxbLRwUGoWU6X4e76OkUz/xyyHlzBg1Vu2F9hjoXPW80VmupIRSXFDliDBJ8NlXXQN47wwYBG28broot@hserver3

After merging, the content of the /root/.ssh/authorized_keys file on my hserver1 machine is:

[ plain]view plaincopy

  1. ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQD8fTIVorOxgDo81yCEgcJTstUcyfOBecL+NZ/OLXCEzaBMw5pLV0UNRX6SZnaAgu/erazkz4sw74zfRIMzEeKKCeNcZ6W78cg+ZNxDcj8+FGeYqY5+nc0YPhXFVI7AwFmfr7fH5hoIT14ClKfGklPgpEgUjDth0PeRwnUTvUy9A1x76npjAZrknQsnoLYle7cVJZ/zO3eGxS75YEdTYDMv+UMiwtcJg7UxOqR+9UT3TO+xLk0yOl8GIISXzMhdCZkmyAH+DmW56ejzsd+JWwCMm177DtOZULl7Osq+OGOtpbloj4HCfstpoiG58SM6Nba8WUXWLnbgqZuHPBag/Kqjroot@hserver1
  2. ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC29kPkYz4c3bd9Qa1TV8kCR0bUNs4f7/dDcR1NKwrgIiecN7zPEWJpjILtlm3niNNx1j5R49QLTLBKKo8PE8mid47POvNypkVRGDeN2IVCivoAQ1T7S8bTJ4zDECGydFYyKQfS2nOAifAWECdgFFtIp52d+dLIAg1JC37pfER9f32rd7anhTHYKwnLwR/NDVGAw3tMkXOnFuFKUMdOJ3GSoVOZf3QHKykGIC2fz/lsXZHaCcQWvOU/Ecd9e0263Tvqh7zGWpF5WYEGjkLlY8v2sioeZxgzog1LWycUTMTqaO+fSdbvKqVj6W0qdy3Io8bJ29Q3S/6MxLa6xvFcBJEXroot@hserver2
  3. ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC1a2o10ttv2570GpuUZy7g9o7lIkkeed7ba25VvFEBcUroQIZ+NIAiVIMGPRiOqm7X4bTLWj5EOz5JXG2l8rwA6CFnWfW3U+ttD1COLOrv2tHTiJ1PhQy1jJR/LpC1iX3sNIDDs+I0txZFGTCTRMLmrbHVTl8j5Yy/CTYLuC7reIZjzpHP7aaS2ev0dlbQzeB08ncjA5Jh4X72qQMOGPUUc2C9oa/CeCvI0SJbt8mkHwqFanZz/IfhLJIKhupjtYsqwQMmzLIjHxbLRwUGoWU6X4e76OkUz/xyyHlzBg1Vu2F9hjoXPW80VmupIRSXFDliDBJ8NlXXQN47wwYBG28broot@hserver3

As shown:

2.6 Copy the authorized_keys file to other machines##

The authorized_keys file is already in the /root/.ssh/ directory of the hserver1 machine, and the content of the file is already OK. Next, copy the file to /root/.ssh/ of hserver2 and /root/ of hserver3. ssh/.

There are many ways to copy, the easiest one is to use the SecureFX visualization tool.

After the copy is complete, you can see that there are such files in the /root/.ssh directory of the three machines

As shown:

The above figure has made it very clear. The /root/.ssh of the three machines have files with the same name, but only the contents of the authorized_keys file are the same.

2.7 Test using ssh for passwordless login##

2.7.1 Test on hserver1###

input the command:

ssh   hserver2

As shown:

input the command:

exit enter

As shown:

input the command:

ssh   hserver3

As shown:

input the command:

exit enter

As shown:

2.7.2 Test on hserver2###

The method is similar to 2.7.1, except that the commands have become ssh hserver1 and ssh hserver3, but it must be noted that after each ssh is completed, exit must be executed, otherwise your subsequent commands will be executed on another machine .

2.7.3 Test on hserver3###

The method is similar to 2.7.1, except that the commands have become ssh hserver1 and ssh hserver2, but it must be noted that after each ssh is completed, exit must be executed, otherwise your subsequent commands will be executed on another machine .

3 Install jdk and hadoop

Explain that in order to save a series of tedious operations such as obtaining administrator rights, authorization, etc., and to streamline the tutorial, here are all logins with root accounts and operations with root privileges.

3.1 Install JDK

The installation of jdk is not detailed here. If necessary, you can refer to the blog post (although the blog post uses ubuntu, the jdk installation is the same under CentOS):

http://blog.csdn.net/pucao_cug/article/details/68948639

3.2 Install hadoop

Note: The following steps need to be repeated on all 3 machines.

3.2.1 Upload the file and unzip###

Create a new directory named hadoop in the opt directory, and upload the downloaded hadoop-2.8.0.tar to this directory, as shown in the figure:

Enter the directory and execute the command:

cd   /opt/hadoop

Execute the decompression command:

tar  -xvf   hadoop-2.8.0.tar.gz

Note: All three machines must perform the above operations, and after decompression, a directory named hadoop-2.8.0 will be obtained.

3.2.2 Create several directories###

Create a few directories in the /root directory, copy and paste and execute the following commands:

[ plain]view plaincopy

  1. mkdir  /root/hadoop
  2. mkdir  /root/hadoop/tmp
  3. mkdir  /root/hadoop/var
  4. mkdir  /root/hadoop/dfs
  5. mkdir  /root/hadoop/dfs/name
  6. mkdir  /root/hadoop/dfs/data

3.2.3 Modify a series of configuration files in etc/hadoop###

Modify a series of files in the /opt/hadoop/hadoop-2.8.0/etc/hadoop directory.

3.2.3.1 Modify core-site.xml

Modify the /opt/hadoop/hadoop-2.8.0/etc/hadoop/core-site.xml file

in Add configuration in the node:

hadoop.tmp.dir

/root/hadoop/tmp

Abase for other temporary directories.

fs.default.name

hdfs://hserver1:9000

< /configuration>

3.2.3.2 Modify hadoop-env.sh

Modify the /opt/hadoop/hadoop-2.8.0/etc/hadoop/hadoop-env.sh file

Export JAVA_HOME=${JAVA_HOME}

change into:

export   JAVA_HOME=/opt/java/jdk1.8.0_121

Description: Modify to your own JDK path

3.2.3.3 Modify hdfs-site.xml

Modify the /opt/hadoop/hadoop-2.8.0/etc/hadoop/hdfs-site.xml file

in Add configuration in the node:

< property>

dfs.name.dir

/root/hadoop/dfs/name

Path on the local filesystem where theNameNode stores the namespace and transactions logs persistently.

< /property>

< property>

dfs.data.dir

/root/hadoop/dfs/data

Comma separated list of paths on the localfilesystem of a DataNode where it should store its blocks.

< /property>

< property>

dfs.replication

2

< /property>

< property>

dfs.permissions

false

need not permissions

< /property>

Note: After dfs.permissions is configured as false, you can allow files on dfs to be generated without checking permissions, which is convenient, but you need to prevent accidental deletion, please set it to true, or delete the property node directly, because The default is true.

3.2.3.4 Create and modify mapred-site.xml

In this version, there is a file named mapred-site.xml.template. Copy the file and rename it to mapred-site.xml. The command is:

[ plain]view plaincopy

  1. cp   /opt/hadoop/hadoop-2.8.0/etc/hadoop/mapred-site.xml.template     /opt/hadoop/hadoop-2.8.0/etc/hadoop/mapred-site.xml

Modify this newly created mapred-site.xml file, in Add configuration in the node:

mapred.job.tracker

hserver1:49001

< /property>

< property>

mapred.local.dir

/root/hadoop/var

< /property>

< property>

mapreduce.framework.name

yarn

< /property>

3.2.3.5 Modify slaves file####

Modify the /opt/hadoop/hadoop-2.8.0/etc/hadoop/slaves file, delete the localhost inside, and add the following content:

[ plain]view plaincopy

  1. hserver2
  2. hserver3

3.2.3.6 Modify the yarn-site.xml file####

Modify the /opt/hadoop/hadoop-2.8.0/etc/hadoop/yarn-site.xml file,

in Add the configuration in the node (note that the memory is better according to the machine configuration, I only match 2 G here because the machine is not good):

< property>

yarn.resourcemanager.hostname

hserver1

The address of the applications manager interface in the RM.

yarn.resourcemanager.address

${yarn.resourcemanager.hostname}:8032

The address of the scheduler interface.

yarn.resourcemanager.scheduler.address

${yarn.resourcemanager.hostname}:8030

The http address of the RM web application.

yarn.resourcemanager.webapp.address

${yarn.resourcemanager.hostname}:8088

The https adddress of the RM web application.

yarn.resourcemanager.webapp.https.address

${yarn.resourcemanager.hostname}:8090

yarn.resourcemanager.resource-tracker.address

${yarn.resourcemanager.hostname}:8031

The address of the RM admin interface.

yarn.resourcemanager.admin.address

${yarn.resourcemanager.hostname}:8033

yarn.nodemanager.aux-services

mapreduce_shuffle

yarn.scheduler.maximum-allocation-mb

2048

Available memory for each node, in MB, 8182MB by default

yarn.nodemanager.vmem-pmem-ratio

2.1

yarn.nodemanager.resource.memory-mb

2048

< /property>

yarn.nodemanager.vmem-check-enabled

false

< /property>

Note: yarn.nodemanager.vmem-check-enabled means ignoring the check of virtual memory. If you are installing on a virtual machine, this configuration is very useful, and it is not easy to cause problems with subsequent operations. If it is on a physical machine and there is enough memory, this configuration can be removed.

4 Start hadoop

4.1 Perform initialization on namenode##

Because hserver1 is a namenode, hserver2 and hserver3 are both datanodes, so only hserver1 needs to be initialized, that is, hdfs is formatted.

Enter the /opt/hadoop/hadoop-2.8.0/bin directory of the hserver1 machine, that is, execute the command:

cd   /opt/hadoop/hadoop-2.8.0/bin

Execute the initialization script, that is, execute the command:

./hadoop  namenode  -format

As shown:

Wait a few seconds, if no error is reported, the execution can be successful, as shown in the figure:

After the format is successful, you can see that there is an additional current directory in the /root/hadoop/dfs/name/ directory, and there are a series of files in the directory

As shown:

4.2 Execute start command on namenode##

Because hserver1 is a namenode, hserver2 and hserver3 are both datanodes, so you only need to execute the startup command on hserver1.

Enter the /opt/hadoop/hadoop-2.8.0/sbin directory of the hserver1 machine, that is, execute the command:

cd    /opt/hadoop/hadoop-2.8.0/sbin

Execute the initialization script, that is, execute the command:

./start-all.sh

The first time we execute the above startup command, we will need to perform interactive operations. Enter yes on the Q&A interface and press Enter

As shown:

5 Test hadoop#

Hadoop is started, you need to test whether Hadoop is normal.

Execute the command to turn off the firewall. Under CentOS7, the command is:

systemctl   stop   firewalld.service

As shown:

hserver1 is our namanode, the IP of this machine is 192.168.119.128, visit the following address on the local computer:

http://192.168.119.128:50070/

Automatically jump to the overview page

As shown:

Visit the following address in the local browser:

http://192.168.119.128:8088/

Automatically jump to the cluster page

As shown:

Recommended Posts

Install Hadoop cluster on Linux (CentOS7+hadoop-2.8.0)
Install MySQL 8.0.16 on Linux Centos
Install python3 on linux, keep python2
Install MySQL on Linux CentOS7 (Windows)
​Install Oracle database on CentOS Linux
Install Hadoop system environment on Ubuntu 18.04
Install Jenkins on linux centos (non-docker way)
How to install MySQL on Ubuntu 18.04 (linux)
CentOS 7 install Hadoop 3.0.0
Linux01 install centOS6.5mini
How to install RPM packages on CentOS Linux
Detailed steps to install Anaconda on Linux (Ubuntu 18.04)
Install and use Git and GitHub on Ubuntu Linux
Detailed steps to install Anaconda on Linux (Ubuntu 18.04)
How to install Oracle Java 14 on Ubuntu Linux
How to install Hadoop in standalone mode on Ubuntu 18.04
Install Docker on Centos7
Install Redis on Ubuntu
install LNMP on centos7.4
Install R4 on ubuntu20
Install Java on Centos 7
Install nvtop on Ubuntu 18.04
Install postgresql-10 on Ubuntu 18.04
Nodejs install on centos7
Install FFmpeg on CentOS 8
CentOS6 install couchdb2 cluster
Install Docker on ubuntu18.04
Install RabbitMQ on CentOS 7
Install nodejs10 on Ubuntu16
Install Node.js on Centos
linux install snmp service-ubuntu
Install mysql on Ubuntu 14.04
Install Django on ubuntu
Linux Centos7 install jdk1.8
Install Pytorch+CUDA on Ubuntu 16.04
CentOS (linux) install PostgreSQL
Install Python3 on Ubuntu 14.04
Maven install on centos7
Install MongoDB on CentOS 7
Install rJava on Ubuntu18
Install Surelog on CentOS8
Install JDK10+ on Ubuntu
Install Python3 on Ubuntu 16.04
How to quickly install docker on Linux (Centos version)
Openjdk install on centos7
Install Jenkins on centos7
install RabbitMQ on centos
Install KDE on Ubuntu16.04.2
Install RabbitMQ on CentOS 7
install Docker on centos6.5
Install Docker on Ubuntu18
Install Python3.7 on Ubuntu
Install Huginn on Ubuntu/Debian
install oracle on centos
Install flashplayer on Ubuntu
Install Elasticsearch 6 on centos7
Install RabbitMQ on CentOS7
Centos 7 install JDK (Linux install jdk)
How to install JDK and Mysql on Ubuntu 18.04 linux system
Docker practice (1): install Docker on Ubuntu 16.04
Install mysql online on centos