CentOS 7 install Hadoop 3.0.0

Recently I am learning about big data, and I need to install [Hadoop] (http://www.linuxidc.com/topicnews.aspx?tid=13). I worked it out for a long time and finally got it right. There are also many articles on the Internet about installing Hadoop, but there will always be some problems, so record the whole process of installing Hadoop 3.0.0 in CentOS 7 and if there is anything wrong, you can leave a message to correct it.

1. SSH password-free login

1、 Test whether you can log in without password

      # ssh localhost

The authenticity of host 'localhost (::1)' can't be established.

2、 Set up password-free login

1)、 Remove the two lines of comments in /etc/ssh/sshd_config, if not, add them, all servers must be set:

        #RSAAuthentication yes  
        #PubkeyAuthentication yes 

2)、 Generate secret key:

# ssh-keygen -t rsa

Remarks: Enter 4 times after entering the command

3)、 Copy to the public key:

# cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys

4)、 Copy the secret key to the target server:

# ssh-copy-id target server IP

5)、 Test: (No error is reported, and no prompt to enter the target server user password, the user switches to the target server user name successfully)

# ssh target server IP

Remarks: After configuring the password-free login from hadoop1 to hadoop2, you also need to configure the password-free login from hadoop2 to hadoop1. The operation on hadoop2 is the same as above

Two, install JDK

hadoop-3.0.0 requires jdk1.8, the installation process is omitted here, there are many online, and the process is relatively simple

Three, install hadoop

1、 Download hadoop:

http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-3.0.0/

2、 Unzip and install:

1), copy hadoop-3.0.0.tar.gz to the /usr/hadoop directory, and then

  #tar -xzvf hadoop-3.0.0.tar.gz

After decompression, the directory after decompression is: /usr/hadoop/hadoop-3.0.0, Hadoop can be used after decompression. Enter the following command to check whether Hadoop is available. If successful, Hadoop version information will be displayed:

# cd /usr/hadoop/hadoop-3.0.0 
        #./bin/hadoop version

2 ), create tmp under the /usr/hadoop/ directory:

# mkdir /usr/hadoop/tmp

3)、 Set environment variables:

# vi /etc/profile
        # set hadoop path 
        export HADOOP_HOME=/usr/hadoop/hadoop-3.0.0export PATH=$PATH:$HADOOP_HOME/bin

4)、 To make the environment variable take effect, run the following command in the terminal:

# source /etc/profile

5)、 Set up hadoop:

A total of 6 main files need to be configured:

hadoop-3.0.0/etc/hadoop/hadoop-env.sh

hadoop-3.0.0/etc/hadoop/yarn-env.sh

hadoop-3.0.0/etc/hadoop/core-site.xml

hadoop-3.0.0/etc/hadoop/hdfs-site.xml

          # The java implementation to use.  
          #export JAVA_HOME=${JAVA_HOME}export JAVA_HOME=/usr/java/jdk1.8.0_152 //Configure according to your own jdk installation directory

⑵、Configure yarn-env.sh:

          #The java implementation to usr  
          export JAVA_HOME=/usr/java/jdk1.8.0_152 //Configure according to your own jdk installation directory

⑶, Configure core-site.xml:

<!- - Specify the file system schema (URI) used by HADOOP, the address of the HDFS boss (NameNode)--><configuration><property><name>fs.default.name</name><value>hdfs://localhost:9000</value><description>HDFS URI, file system://namenode identification:The port number</description></property><property><name>hadoop.tmp.dir</name><value>/usr/hadoop/tmp</value><description>Local hadoop temporary folder on namenode</description></property></configuration>

⑷、Configure hdfs-site.xml:

< configuration><!—hdfs-site.xml--><property><name>dfs.replication</name><value>1</value><description>The number of copies, the default configuration is 3,Should be less than the number of datanode machines</description></property></configuration>

⑸、Configure mapred-site.xml:

<!- - Specify mr to run on yarn--><configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property></configuration>

⑹、Configure yarn-site.xml:

<!- - Specify the address of YARN&#39;s boss (ResourceManager)--><configuration><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property>


             
              yarn.nodemanager.aux-services
              mapreduce_shuffle
             

< /configuration>

Note: The above configuration is the simplest configuration, there are many configurations that can be added by yourself

7), format the namenode:

  #CD /usr/hadoop/hadoop-3.0.0
          # ./bin/hdfs namenode -format

If successful, you will see the prompts "successfully formatted" and "Exitting with status 0", if it is "Exitting with status 1", it means an error

Note: You only need to format the namenode, and the datanode does not need to be formatted (if formatted, you can delete all the files in the /usr/hadoop/tmp directory), so copy the installation folder to other servers first, and then format

      #CD /usr/hadoop/hadoop-3.0.0
      # sbin/start-dfs.sh

If the running script reports the following error,

ERROR: Attempting to launch hdfs namenode as root
      ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting launch.
      Starting datanodes
      ERROR: Attempting to launch hdfs datanode as root
      ERROR: but there is no HDFS_DATANODE_USER defined. Aborting launch.
      Starting secondary namenodes [localhost.localdomain]
      ERROR: Attempting to launch hdfs secondarynamenode as root
      ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting launch.

solution

(Caused by the lack of user definition) so edit startup and shutdown

      $ vim sbin/start-dfs.sh
      $ vim sbin/stop-dfs.sh

Add at the top margin

      HDFS_DATANODE_USER=root  
      HADOOP_SECURE_DN_USER=hdfs  
      HDFS_NAMENODE_USER=root  
      HDFS_SECONDARYNAMENODE_USER=root
  1. Start ResourceManager and NodeManager:
      #CD /usr/hadoop/hadoop-3.0.0
      #sbin/start-yarn.sh

**If the following error is reported during startup, **

**      Starting resourcemanager****      ERROR: Attempting to launch yarn resourcemanager as root****      ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting launch.**

solution

(Also due to lack of user definition)

**It is caused by the lack of user definition, so edit the start and close scripts separately **

      $ vim sbin/start-yarn.sh 
      $ vim sbin/stop-yarn.sh 

Top blank added

      YARN_RESOURCEMANAGER_USER=root  
      HADOOP_SECURE_DN_USER=yarn  
      YARN_NODEMANAGER_USER=root

3)、 Start verification:

Execute the jps command, the following figure is basically completed

Note: You can also use the following command to start HDFS, ResourceManager and NodeManager at the same time:

      #CD /usr/hadoop/hadoop-3.0.0
      #sbin/start-all.sh

Hadoop2.3-HA high-availability cluster environment construction http://www.linuxidc.com/Linux/2017-03/142155.htm

Installation and deployment of Cloudera 5.10.1 (CDH) based on CentOS7 of the Hadoop project http://www.linuxidc.com/Linux/2017-04/143095.htm

Hadoop2.7.2 cluster construction detailed explanation (high availability) http://www.linuxidc.com/Linux/2017-03/142052.htm

Use Ambari to deploy a Hadoop cluster (build an intranet HDP source) http://www.linuxidc.com/Linux/2017-03/142136.htm

Ubuntu Hadoop cluster installation under 14.04 http://www.linuxidc.com/Linux/2017-02/140783.htm

CentOS 6.7 install Hadoop 2.7.2 http://www.linuxidc.com/Linux/2017-08/146232.htm

Build a distributed Hadoop-2.7.3 cluster on Ubuntu 16.04 http://www.linuxidc.com/Linux/2017-07/145503.htm

Hadoop 2.6.4 distributed cluster environment construction under CentOS 7 http://www.linuxidc.com/Linux/2017-06/144932.htm

Hadoop2.7.3+Spark2.1.0 fully distributed cluster construction process http://www.linuxidc.com/Linux/2017-06/144926.htm

For more information about Hadoop, please refer to the topic page of Hadoop http://www.linuxidc.com/topicnews.aspx?tid=13

This article permanently updates the link address: http://www.linuxidc.com/Linux/2018-02/150812.htm

Recommended Posts

CentOS 7 install Hadoop 3.0.0
1.5 Install Centos7
Centos6 install Python2.7.13
CentOS7.2 install Mysql5.7.13
CentOS install Redmine
Centos7 install Python 3.6.
CentOS 7 install Docker
CentOS7 install GlusterFS
CentOS 7.4 install Zabbix 3.4
CentOS7 install Docker
Centos6.5 install Tomcat
CentOS install Python 3.6
Vmware install CentOS6
centos7 install docker-ce 18.01.0
CentOS 7.2 install MariaDB
Centos7 install Python2.7
Centos 7.6 install seleniu
CentOS 7.3 install Zabbix3
Centos7 install LAMP+PHPmyadmin
CentOS install mysql
CentOS install openjdk 1.8
CENTOS6.5 install CDH5.12.1 (1)
CentOS install PHP
CentOS6 install mist.io
Centos7 install Docker
CentOS7 install mysql
centOs install rabbitMQ
CentOS 7 install MySQL 5.6
Centos7 install Nginx
CentOS6.5 install CDH5.13
Centos7 install docker18
Centos install Python3
centos7 install docker
CentOS install jdk
centos7 install nginx-rtmp
CentOS8 install MySQL8.0
Centos6.3 install KVM
CentOS install PostgreSQL 9.1
CentOS7 install mysql8
CentOS 7 install Java 1.8
CentOS8 install fastdfs6.06
CentOS 7 install Gitlab
Centos 7 install PostgreSQL
CentOS7 install MySQL8
CentOS 7 install Java 1.8
CentOS 6 install Docker
centos 6.5 install zabbix 4.4
Centos8 install Docker
CentOS6.8 install python2.7
CentOS install nodejs 8
CentOS6.5 install GNS3
centos 7.5 install mysql5.7.17
Centos7 install MySQL8.0-manual
CentOS7 install Kubernetes 1.16.3
VirtualBox install centos7
centos7 install lamp
Install centos7 and connect
Install Docker on Centos7
install LNMP on centos7.4
CentOS 8 install ZABBIX4.4 guide
Install Java on Centos 7