This article introduces building a hadoop2.10 high-availability cluster in centos7. First, prepare 6 machines: 2 nn (namenode); 4 dn (datanode); 3 jns (journalnodes)
IP | hostname | process |
---|---|---|
192.168.30.141 | s141 | nn1(namenode),zkfc(DFSZKFailoverController),zk(QuorumPeerMain) |
192.168.30.142 | s142 | dn(datanode), jn(journalnode),zk(QuorumPeerMain) |
192.168.30.143 | s143 | dn(datanode), jn(journalnode),zk(QuorumPeerMain) |
192.168.30.144 | s144 | dn(datanode), jn(journalnode) |
192.168.30.145 | s145 | dn(datanode) |
192.168.30.146 | s146 | nn2(namenode),zkfc(DFSZKFailoverController) |
Each machine jps process:
Since I use a vmware virtual machine, after configuring a machine, clone the remaining machine and modify the hostname and IP, so that the configuration of each machine is unified. Each machine configuration adds hdfs users and users Group, configure the jdk environment, install hadoop, this time build a highly available cluster under hdfs users, you can refer to: centos7 build hadoop2.10 pseudo-distribution mode
Here are some steps and details for installing a highly available cluster:
1. Set the hostname and hosts of each machine
Modify the hosts file. After the hosts are set, you can use the hostname to access the machine. This is more convenient. Modify as follows:
127.0.0.1 locahost
192.168.30.141 s141
192.168.30.142 s142
192.168.30.143 s143
192.168.30.144 s144
192.168.30.145 s145
192.168.30.146 s146
We set s141 to nn1 and s146 to nn2. We need s141 and s146 to be able to log in to other machines without secret through ssh, so we need to generate a key pair under the hdfs user of s141 and s146 machines, and set the public keys of s141 and s146 Send it to other machines and put it in the ~/.ssh/authorized_keys file, to be more precise on all machines (including yourself) where you want to add the public key
Generate a key pair on s141 and s146 machines:
ssh-keygen -t rsa -P ''-f ~/.ssh/id_rsa
Add the content of the id_rsa.pub file to the /home/hdfs/.ssh/authorized_keys of the s141-s146 machine. Now that other machines do not have an authorized_keys file for the time being, we can rename id_rsa.pub to authorized_keys, if other machines already have authorized_keys After the file can append the content of id_rsa.pub to the file, remote replication can use the scp command:
s141 machine public key copied to other machines
scp id_rsa.pub hdfs@s141:/home/hdfs/.ssh/id_rsa_141.pub
scp id_rsa.pub hdfs@s142:/home/hdfs/.ssh/id_rsa_141.pub
scp id_rsa.pub hdfs@s143:/home/hdfs/.ssh/id_rsa_141.pub
scp id_rsa.pub hdfs@s144:/home/hdfs/.ssh/id_rsa_141.pub
scp id_rsa.pub hdfs@s145:/home/hdfs/.ssh/id_rsa_141.pub
scp id_rsa.pub hdfs@s146:/home/hdfs/.ssh/id_rsa_141.pub
Copy the s146 machine public key to other machines
scp id_rsa.pub hdfs@s141:/home/hdfs/.ssh/id_rsa_146.pub
scp id_rsa.pub hdfs@s142:/home/hdfs/.ssh/id_rsa_146.pub
scp id_rsa.pub hdfs@s143:/home/hdfs/.ssh/id_rsa_146.pub
scp id_rsa.pub hdfs@s144:/home/hdfs/.ssh/id_rsa_146.pub
scp id_rsa.pub hdfs@s145:/home/hdfs/.ssh/id_rsa_146.pub
scp id_rsa.pub hdfs@s146:/home/hdfs/.ssh/id_rsa_146.pub
On each machine, you can use cat to append the key to the authorized_keys file
cat id_rsa_141.pub >> authorized_keys
cat id_rsa_146.pub >> authorized_keys
At this time, the authorized_keys file permissions need to be changed to 644 (note that ssh secret login failures are often caused by this permission problem)
chmod 644 authorized_keys
Configuration details:
Note: s141 and s146 have exactly the same configuration, especially ssh.
[ hdfs-site.xml]<property><name>dfs.nameservices</name><value>mycluster</value></property>
[ hdfs-site.xml]<!--Two ids of the name node under myucluster--><property><name>dfs.ha.namenodes.mycluster</name><value>nn1,nn2</value></property>
[ hdfs-site.xml]
Configure the rpc address of each nn.
< property><name>dfs.namenode.rpc-address.mycluster.nn1</name><value>s141:8020</value></property><property><name>dfs.namenode.rpc-address.mycluster.nn2</name><value>s146:8020</value></property>
[ hdfs-site.xml]<property><name>dfs.namenode.http-address.mycluster.nn1</name><value>s141:50070</value></property><property><name>dfs.namenode.http-address.mycluster.nn2</name><value>s146:50070</value></property>
[ hdfs-site.xml]<property><name>dfs.namenode.shared.edits.dir</name><value>qjournal://s142:8485;s143:8485;s144:8485/mycluster</value></property>
[ hdfs-site.xml]<property><name>dfs.client.failover.proxy.provider.mycluster</name><value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value></property>
[ hdfs-site.xml]<property><name>dfs.ha.fencing.methods</name><value>sshfence</value></property><property><name>dfs.ha.fencing.ssh.private-key-files</name><value>/home/hdfs/.ssh/id_rsa</value></property>
[ core-site.xml]<property><name>fs.defaultFS</name><value>hdfs://mycluster</value></property>
[ hdfs-site.xml]<property><name>dfs.journalnode.edits.dir</name><value>/home/hdfs/hadoop/journal</value></property>
Complete configuration file:
core-site.xml
<? xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><property><name>fs.defaultFS</name><value>hdfs://mycluster/</value></property><property><name>hadoop.tmp.dir</name><value>/home/hdfs/hadoop</value></property></configuration>
hdfs-site.xml
<? xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><property><name>dfs.replication</name><value>3</value></property><property><name>dfs.hosts</name><value>/opt/soft/hadoop/etc/dfs.include.txt</value></property><property><name>dfs.hosts.exclude</name><value>/opt/soft/hadoop/etc/dfs.hosts.exclude.txt</value></property><property><name>dfs.nameservices</name><value>mycluster</value></property><property><name>dfs.ha.namenodes.mycluster</name><value>nn1,nn2</value></property><property><name>dfs.namenode.rpc-address.mycluster.nn1</name><value>s141:8020</value></property><property><name>dfs.namenode.rpc-address.mycluster.nn2</name><value>s146:8020</value></property><property><name>dfs.namenode.http-address.mycluster.nn1</name><value>s141:50070</value></property><property><name>dfs.namenode.http-address.mycluster.nn2</name><value>s146:50070</value></property><property><name>dfs.namenode.shared.edits.dir</name><value>qjournal://s142:8485;s143:8485;s144:8485/mycluster</value></property><property><name>dfs.client.failover.proxy.provider.mycluster</name><value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value></property><property><name>dfs.ha.fencing.methods</name><value>sshfence</value></property><property><name>dfs.ha.fencing.ssh.private-key-files</name><value>/home/hdfs/.ssh/id_rsa</value></property><property><name>dfs.journalnode.edits.dir</name><value>/home/hdfs/hadoop/journal</value></property></configuration>
mapred-site.xml
<? xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property></configuration>
yarn-site.xml
<? xml version="1.0"?><configuration><!-- Site specific YARN configuration properties --><property><name>yarn.resourcemanager.hostname</name><value>s141</value></property><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property></configuration>
1 ) Start the jn process (s142, s143, s144) on the jn node respectively
hadoop-daemon.sh start journalnode
2 ) After starting jn, perform disk metadata synchronization between the two NNs
a) If it is a brand new cluster, first format the file system and only need to execute on one nn.
[s141|s146]
hadoop namenode -format
b) If you convert a non-HA cluster to an HA cluster, copy the metadata of the original NN to another NN.
scp -r /home/hdfs/hadoop/dfs hdfs@s146:/home/hdfs/hadoop/
hdfs namenode -bootstrapStandby
If the s141 name node is not started, it will fail, as shown in the figure:
After starting the s141 name node, execute the command on s141
hadoop-daemon.sh start namenode
Then execute the standby boot command, note: prompt whether to format, select N, as shown in the figure:
Execute the following command on one of the NNs to complete the transmission of the edit log to the jn node.
hdfs namenode -initializeSharedEdits
If a java.nio.channels.OverlappingFileLockException error is reported during execution:
Explain that the namenode is starting, and the namenode node needs to be stopped (hadoop-daemon.sh stop namenode)
After execution, check whether there is edit data in s142, s143, and s144. Here, check the production mycluster directory, which contains edit log data, as follows:
Start all nodes.
Start the name node and all data nodes on s141:
hadoop-daemon.sh start namenode
hadoop-daemons.sh start datanode
Start the name node on s146
hadoop-daemon.sh start namenode
At this time, visit http://192.168.30.141:50070/ and http://192.168.30.146:50070/ in the browser and you will find that both namenodes are standby
At this time, you need to manually use the command to switch one of them to the active state, here set s141 (nn1) to active
hdfs haadmin -transitionToActive nn1
S141 is active at this time
hdfs haadmin commonly used commands:
So far, the manual disaster recovery high availability configuration is completed, but this method is not smart and cannot automatically sense disaster recovery, so the automatic disaster recovery configuration is introduced below
Need to introduce two components, zookeeperquarum and zk disaster recovery controller (ZKFC)
Set up a zookeeper cluster, choose s141, s142, s143 three machines, download zookeeper: http://mirror.bit.edu.cn/apache/zookeeper/zookeeper-3.5.6
tar -xzvf apache-zookeeper-3.5.6-bin.tar.gz -C /opt/soft/zookeeper-3.5.6
Copy the code code as follows:
source /etc/profile
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage,/tmp here is just
# example sakes.
dataDir=/home/hdfs/zookeeper
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase thisif you need to handle more clients
# maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
# autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
# autopurge.purgeInterval=1
server.1=s141:2888:3888
server.2=s142:2888:3888
server.3=s143:2888:3888
Create a myid file in the /home/hdfs/zookeeper (dataDir path configured in the zoo.cfg configuration file) directory of s141 with a value of 1 (corresponding to server.1 in the zoo.cfg configuration file)
Create a myid file in the /home/hdfs/zookeeper (dataDir path configured in the zoo.cfg configuration file) directory of s142 with a value of 2 (corresponding to server.2 in the zoo.cfg configuration file)
Create a myid file in the /home/hdfs/zookeeper (dataDir path configured in the zoo.cfg configuration file) directory of s143 with a value of 3 (corresponding to server.3 in the zoo.cfg configuration file)
zkServer.sh start
The zk process will appear after successful startup:
Configure hdfs related configuration:
stop-all.sh
[ hdfs-site.xml]<property><name>dfs.ha.automatic-failover.enabled</name><value>true</value></property>
< property><name>ha.zookeeper.quorum</name><value>s141:2181,s142:2181,s143:2181</value></property>
Distribute the above two files to all nodes.
In one of the NN (s141), initialize the HA state in ZK
hdfs zkfc -formatZK
The following results indicate success:
You can also check in zk:
start-dfs.sh
View each machine process:
Start successfully, take a look at webui
s146 is active
s141 is on standby
So far, hadoop automatic disaster recovery HA is built
to sum up
The above is the introduction of centos7 to build Hadoop 2.10 high availability (HA), I hope it will be helpful to everyone!