CentOS 7 install Hadoop 3.0.0

Recently I am learning about big data, and I need to install [Hadoop] (http://www.linuxidc.com/topicnews.aspx?tid=13). I worked it out for a long time and finally got it right. There are also many articles on the Internet about installing Hadoop, but there will always be some problems, so record the whole process of installing Hadoop 3.0.0 in CentOS 7 and if there is anything wrong, you can leave a message to correct it.

1. SSH password-free login

1、 Test whether you can log in without password

　　　　　　# ssh localhost

The authenticity of host 'localhost (::1)' can't be established.

2、 Set up password-free login

1)、 Remove the two lines of comments in /etc/ssh/sshd_config, if not, add them, all servers must be set:

　　　　　　　　#RSAAuthentication yes  
　　　　　　　　#PubkeyAuthentication yes

2)、 Generate secret key:

# ssh-keygen -t rsa

Remarks: Enter 4 times after entering the command

3)、 Copy to the public key:

# cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys

4)、 Copy the secret key to the target server:

# ssh-copy-id target server IP

5)、 Test: (No error is reported, and no prompt to enter the target server user password, the user switches to the target server user name successfully)

# ssh target server IP

Remarks: After configuring the password-free login from hadoop1 to hadoop2, you also need to configure the password-free login from hadoop2 to hadoop1. The operation on hadoop2 is the same as above

Two, install JDK

hadoop-3.0.0 requires jdk1.8, the installation process is omitted here, there are many online, and the process is relatively simple

Three, install hadoop

1、 Download hadoop:

http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-3.0.0/

2、 Unzip and install:

1), copy hadoop-3.0.0.tar.gz to the /usr/hadoop directory, and then

　　#tar -xzvf hadoop-3.0.0.tar.gz

After decompression, the directory after decompression is: /usr/hadoop/hadoop-3.0.0, Hadoop can be used after decompression. Enter the following command to check whether Hadoop is available. If successful, Hadoop version information will be displayed:

# cd /usr/hadoop/hadoop-3.0.0 
　　　　　　　　#./bin/hadoop version

2 ), create tmp under the /usr/hadoop/ directory:

# mkdir /usr/hadoop/tmp

3)、 Set environment variables:

# vi /etc/profile
　　　　　　　　# set hadoop path 
　　　　　　　　export HADOOP_HOME=/usr/hadoop/hadoop-3.0.0export PATH=$PATH:$HADOOP_HOME/bin

4)、 To make the environment variable take effect, run the following command in the terminal:

# source /etc/profile

5)、 Set up hadoop:

A total of 6 main files need to be configured:

hadoop-3.0.0/etc/hadoop/hadoop-env.sh

hadoop-3.0.0/etc/hadoop/yarn-env.sh

hadoop-3.0.0/etc/hadoop/core-site.xml

hadoop-3.0.0/etc/hadoop/hdfs-site.xml

hadoop-3.0.0/etc/hadoop/mapred-site.xml *
hadoop-3.0.0/etc/hadoop/yarn-site.xml*
⑴, configure hadoop-env.sh: *

　　　　　　　　　　# The java implementation to use.  
　　　　　　　　　　#export JAVA_HOME=${JAVA_HOME}export JAVA_HOME=/usr/java/jdk1.8.0_152 //Configure according to your own jdk installation directory

⑵、Configure yarn-env.sh:

　　　　　　　　　　#The java implementation to usr  
　　　　　　　　　　export JAVA_HOME=/usr/java/jdk1.8.0_152 //Configure according to your own jdk installation directory

⑶, Configure core-site.xml:

<!- - Specify the file system schema (URI) used by HADOOP, the address of the HDFS boss (NameNode)--><configuration><property><name>fs.default.name</name><value>hdfs://localhost:9000</value><description>HDFS URI, file system://namenode identification:The port number</description></property><property><name>hadoop.tmp.dir</name><value>/usr/hadoop/tmp</value><description>Local hadoop temporary folder on namenode</description></property></configuration>

⑷、Configure hdfs-site.xml:

< configuration><!—hdfs-site.xml--><property><name>dfs.replication</name><value>1</value><description>The number of copies, the default configuration is 3,Should be less than the number of datanode machines</description></property></configuration>

⑸、Configure mapred-site.xml:

<!- - Specify mr to run on yarn--><configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property></configuration>

⑹、Configure yarn-site.xml:

<!- - Specify the address of YARN&#39;s boss (ResourceManager)--><configuration><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property>

　　　　　　　　　　　　　
　　　　　　　　　　　　　　yarn.nodemanager.aux-services
　　　　　　　　　　　　　　mapreduce_shuffle
　　　　　　　　　　　　　

< /configuration>

Note: The above configuration is the simplest configuration, there are many configurations that can be added by yourself

1. Copy /usr/hadoop to other servers:*
scp -r /usr/hadoop [email protected]:/usr/hadoop*

7), format the namenode:

　　#CD /usr/hadoop/hadoop-3.0.0
　　　　　　　　　　# ./bin/hdfs namenode -format

If successful, you will see the prompts "successfully formatted" and "Exitting with status 0", if it is "Exitting with status 1", it means an error

Note: You only need to format the namenode, and the datanode does not need to be formatted (if formatted, you can delete all the files in the /usr/hadoop/tmp directory), so copy the installation folder to other servers first, and then format

Four, test: *
1. Start HDFS:*

　　　　　　#CD /usr/hadoop/hadoop-3.0.0
　　　　　　# sbin/start-dfs.sh

If the running script reports the following error,

ERROR: Attempting to launch hdfs namenode as root
　　　　　　ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting launch.
　　　　　　Starting datanodes
　　　　　　ERROR: Attempting to launch hdfs datanode as root
　　　　　　ERROR: but there is no HDFS_DATANODE_USER defined. Aborting launch.
　　　　　　Starting secondary namenodes [localhost.localdomain]
　　　　　　ERROR: Attempting to launch hdfs secondarynamenode as root
　　　　　　ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting launch.

solution

(Caused by the lack of user definition) so edit startup and shutdown

　　　　　　$ vim sbin/start-dfs.sh
　　　　　　$ vim sbin/stop-dfs.sh

Add at the top margin

　　　　　　HDFS_DATANODE_USER=root  
　　　　　　HADOOP_SECURE_DN_USER=hdfs  
　　　　　　HDFS_NAMENODE_USER=root  
　　　　　　HDFS_SECONDARYNAMENODE_USER=root

Start ResourceManager and NodeManager:

　　　　　　#CD /usr/hadoop/hadoop-3.0.0
　　　　　　#sbin/start-yarn.sh

**If the following error is reported during startup, **

**　　　　　　Starting resourcemanager****　　　　　　ERROR: Attempting to launch yarn resourcemanager as root****　　　　　　ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting launch.**

solution

(Also due to lack of user definition)

**It is caused by the lack of user definition, so edit the start and close scripts separately **

　　　　　　$ vim sbin/start-yarn.sh 
　　　　　　$ vim sbin/stop-yarn.sh

Top blank added

　　　　　　YARN_RESOURCEMANAGER_USER=root  
　　　　　　HADOOP_SECURE_DN_USER=yarn  
　　　　　　YARN_NODEMANAGER_USER=root

3)、 Start verification:

Execute the jps command, the following figure is basically completed

Note: You can also use the following command to start HDFS, ResourceManager and NodeManager at the same time:

　　　　　　#CD /usr/hadoop/hadoop-3.0.0
　　　　　　#sbin/start-all.sh

Hadoop2.3-HA high-availability cluster environment construction http://www.linuxidc.com/Linux/2017-03/142155.htm

Installation and deployment of Cloudera 5.10.1 (CDH) based on CentOS7 of the Hadoop project http://www.linuxidc.com/Linux/2017-04/143095.htm

Hadoop2.7.2 cluster construction detailed explanation (high availability) http://www.linuxidc.com/Linux/2017-03/142052.htm

Use Ambari to deploy a Hadoop cluster (build an intranet HDP source) http://www.linuxidc.com/Linux/2017-03/142136.htm

Ubuntu Hadoop cluster installation under 14.04 http://www.linuxidc.com/Linux/2017-02/140783.htm

CentOS 6.7 install Hadoop 2.7.2 http://www.linuxidc.com/Linux/2017-08/146232.htm

Build a distributed Hadoop-2.7.3 cluster on Ubuntu 16.04 http://www.linuxidc.com/Linux/2017-07/145503.htm

Hadoop 2.6.4 distributed cluster environment construction under CentOS 7 http://www.linuxidc.com/Linux/2017-06/144932.htm

Hadoop2.7.3+Spark2.1.0 fully distributed cluster construction process http://www.linuxidc.com/Linux/2017-06/144926.htm

For more information about Hadoop, please refer to the topic page of Hadoop http://www.linuxidc.com/topicnews.aspx?tid=13

This article permanently updates the link address: http://www.linuxidc.com/Linux/2018-02/150812.htm