Recently I am learning about big data, and I need to install [Hadoop] (http://www.linuxidc.com/topicnews.aspx?tid=13). I worked it out for a long time and finally got it right. There are also many articles on the Internet about installing Hadoop, but there will always be some problems, so record the whole process of installing Hadoop 3.0.0 in CentOS 7 and if there is anything wrong, you can leave a message to correct it.
1、 Test whether you can log in without password
# ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
2、 Set up password-free login
1)、 Remove the two lines of comments in /etc/ssh/sshd_config, if not, add them, all servers must be set:
#RSAAuthentication yes
#PubkeyAuthentication yes
2)、 Generate secret key:
# ssh-keygen -t rsa
Remarks: Enter 4 times after entering the command
3)、 Copy to the public key:
# cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
4)、 Copy the secret key to the target server:
# ssh-copy-id target server IP
5)、 Test: (No error is reported, and no prompt to enter the target server user password, the user switches to the target server user name successfully)
# ssh target server IP
Remarks: After configuring the password-free login from hadoop1 to hadoop2, you also need to configure the password-free login from hadoop2 to hadoop1. The operation on hadoop2 is the same as above
hadoop-3.0.0 requires jdk1.8, the installation process is omitted here, there are many online, and the process is relatively simple
1、 Download hadoop:
http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-3.0.0/
2、 Unzip and install:
1), copy hadoop-3.0.0.tar.gz to the /usr/hadoop directory, and then
#tar -xzvf hadoop-3.0.0.tar.gz
After decompression, the directory after decompression is: /usr/hadoop/hadoop-3.0.0, Hadoop can be used after decompression. Enter the following command to check whether Hadoop is available. If successful, Hadoop version information will be displayed:
# cd /usr/hadoop/hadoop-3.0.0
#./bin/hadoop version
2 ), create tmp under the /usr/hadoop/ directory:
# mkdir /usr/hadoop/tmp
3)、 Set environment variables:
# vi /etc/profile
# set hadoop path
export HADOOP_HOME=/usr/hadoop/hadoop-3.0.0export PATH=$PATH:$HADOOP_HOME/bin
4)、 To make the environment variable take effect, run the following command in the terminal:
# source /etc/profile
5)、 Set up hadoop:
A total of 6 main files need to be configured:
hadoop-3.0.0/etc/hadoop/hadoop-env.sh
hadoop-3.0.0/etc/hadoop/yarn-env.sh
hadoop-3.0.0/etc/hadoop/core-site.xml
hadoop-3.0.0/etc/hadoop/hdfs-site.xml
hadoop-3.0.0/etc/hadoop/mapred-site.xml *
hadoop-3.0.0/etc/hadoop/yarn-site.xml*
⑴, configure hadoop-env.sh: *
# The java implementation to use.
#export JAVA_HOME=${JAVA_HOME}export JAVA_HOME=/usr/java/jdk1.8.0_152 //Configure according to your own jdk installation directory
⑵、Configure yarn-env.sh:
#The java implementation to usr
export JAVA_HOME=/usr/java/jdk1.8.0_152 //Configure according to your own jdk installation directory
⑶, Configure core-site.xml:
<!- - Specify the file system schema (URI) used by HADOOP, the address of the HDFS boss (NameNode)--><configuration><property><name>fs.default.name</name><value>hdfs://localhost:9000</value><description>HDFS URI, file system://namenode identification:The port number</description></property><property><name>hadoop.tmp.dir</name><value>/usr/hadoop/tmp</value><description>Local hadoop temporary folder on namenode</description></property></configuration>
⑷、Configure hdfs-site.xml:
< configuration><!—hdfs-site.xml--><property><name>dfs.replication</name><value>1</value><description>The number of copies, the default configuration is 3,Should be less than the number of datanode machines</description></property></configuration>
⑸、Configure mapred-site.xml:
<!- - Specify mr to run on yarn--><configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property></configuration>
⑹、Configure yarn-site.xml:
<!- - Specify the address of YARN's boss (ResourceManager)--><configuration><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property>
< /configuration>
Note: The above configuration is the simplest configuration, there are many configurations that can be added by yourself
scp -r /usr/hadoop [email protected]:/usr/hadoop*
7), format the namenode:
#CD /usr/hadoop/hadoop-3.0.0
# ./bin/hdfs namenode -format
If successful, you will see the prompts "successfully formatted" and "Exitting with status 0", if it is "Exitting with status 1", it means an error
Note: You only need to format the namenode, and the datanode does not need to be formatted (if formatted, you can delete all the files in the /usr/hadoop/tmp directory), so copy the installation folder to other servers first, and then format
Four, test: *
#CD /usr/hadoop/hadoop-3.0.0
# sbin/start-dfs.sh
If the running script reports the following error,
ERROR: Attempting to launch hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting launch.
Starting datanodes
ERROR: Attempting to launch hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting launch.
Starting secondary namenodes [localhost.localdomain]
ERROR: Attempting to launch hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting launch.
solution
(Caused by the lack of user definition) so edit startup and shutdown
$ vim sbin/start-dfs.sh
$ vim sbin/stop-dfs.sh
Add at the top margin
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
#CD /usr/hadoop/hadoop-3.0.0
#sbin/start-yarn.sh
**If the following error is reported during startup, **
** Starting resourcemanager**** ERROR: Attempting to launch yarn resourcemanager as root**** ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting launch.**
solution
(Also due to lack of user definition)
**It is caused by the lack of user definition, so edit the start and close scripts separately **
$ vim sbin/start-yarn.sh
$ vim sbin/stop-yarn.sh
Top blank added
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
3)、 Start verification:
Execute the jps command, the following figure is basically completed
Note: You can also use the following command to start HDFS, ResourceManager and NodeManager at the same time:
#CD /usr/hadoop/hadoop-3.0.0
#sbin/start-all.sh
Hadoop2.3-HA high-availability cluster environment construction http://www.linuxidc.com/Linux/2017-03/142155.htm
Installation and deployment of Cloudera 5.10.1 (CDH) based on CentOS7 of the Hadoop project http://www.linuxidc.com/Linux/2017-04/143095.htm
Hadoop2.7.2 cluster construction detailed explanation (high availability) http://www.linuxidc.com/Linux/2017-03/142052.htm
Use Ambari to deploy a Hadoop cluster (build an intranet HDP source) http://www.linuxidc.com/Linux/2017-03/142136.htm
Ubuntu Hadoop cluster installation under 14.04 http://www.linuxidc.com/Linux/2017-02/140783.htm
CentOS 6.7 install Hadoop 2.7.2 http://www.linuxidc.com/Linux/2017-08/146232.htm
Build a distributed Hadoop-2.7.3 cluster on Ubuntu 16.04 http://www.linuxidc.com/Linux/2017-07/145503.htm
Hadoop 2.6.4 distributed cluster environment construction under CentOS 7 http://www.linuxidc.com/Linux/2017-06/144932.htm
Hadoop2.7.3+Spark2.1.0 fully distributed cluster construction process http://www.linuxidc.com/Linux/2017-06/144926.htm
For more information about Hadoop, please refer to the topic page of Hadoop http://www.linuxidc.com/topicnews.aspx?tid=13
This article permanently updates the link address: http://www.linuxidc.com/Linux/2018-02/150812.htm
Recommended Posts