The hadoop used in this blog post is 2.8.0
Open the download address selection page:
http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.8.0/hadoop-2.8.0.tar.gz
As shown:
The address I use is:
http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.8.0/hadoop-2.8.0.tar.gz
The Linux system used here is CentOS7 (in fact, Ubuntu is also very good, but here is the CentOS7 demo), the installation method is not much to say, if necessary, please refer to this blog post:
http://blog.csdn.net/pucao_cug/article/details/71229416
Install 3 machines, and the machine names are hserver1, hserver2, hserver3 (indicating that the machine name is not called this way, and you can use the hostname command to modify it later).
As shown:
Note: In order to avoid the trouble of the following series of authorization, the root account is used to log in and operate directly.
Use the ifconfig command to view the IP of these 3 machines. The correspondence between my machine name and ip is:
192.168.119.128 hserver1
192.168.119.129 hserver2
192.168.119.130 hserver3
For the convenience of subsequent operations, ensure that the hostname of the machine is what we want. Take the machine 192.168.119.128 as an example, log in with the root account, and then use the hostname command to view the machine name
As shown:
Found that the machine name is not what we want. But this is easy to handle. I will change its name. The command is:
hostname hserver1
As shown:
After the execution is complete, check to see if it has been modified, and type the hostname command:
As shown:
Similarly, rename the other two machines to hserver2 and hserver3 respectively.
Modify the /etc/hosts file of these 3 machines and add the following content to the file:
[ plain]view plaincopy
As shown:
Note: The IP address does not need to be the same as mine, here is just a mapping, as long as the mapping is correct, as for the modification method, you can use the vim command, or you can write the contents of the hosts file on your local machine. Get the Linux machine to cover.
After the configuration is complete, use the ping command to check whether the three machines can ping each other. Taking hserver1 as an example, where to execute the command:
ping -c 3 hserver2
As shown:
Excuting an order:
ping -c 3 hserver3
As shown:
If the ping works, it means that the machines are interconnected and the hosts configuration is correct.
Take hserve1 as an example, execute the command to generate a secret key with an empty string (the public key will be used later), the command is:
ssh-keygen -t rsa -P ''
As shown:
Because I am using the root account now, the secret key file is saved in the /root/.ssh/ directory, and you can use the command to view it. The command is:
ls /root/.ssh/
As shown:
Use the same method to generate secret keys for hserver2 and hserver3 (the command is exactly the same, no need to modify it).
The next thing to do is to save a file with the same content in the /root/.ssh/ directory of the three machines. The file name is authorized_keys. The content of the file is the public key we just generated for the three machines. For convenience, my next step is to generate an authorized_keys file on hserver1, and then add the public keys generated by the three machines to the authorized_keys file of hserver1, and then copy the authorized_keys file to hserver2 and hserver3.
First use the command to generate a file named authorized_keys in the /root/.ssh/ directory of hserver1. The command is:
touch /root/.ssh/authorized_keys
As shown:
You can use the command to see if the generation is successful, the command is:
ls /root/.ssh/
As shown:
Then copy the contents of the /root/.ssh/id_rsa.pub file on hserver1, the contents of the /root/.ssh/id_rsa.pub file on hserver2, and the contents of the /root/.ssh/id_rsa.pub file on hserver3 to this authorized_keys. In the file, there are many ways to copy, you can use the cat command and the vim command to get it, or you can directly download the /root/.ssh/id_rsa.pub file on the three machines to the local, and edit the authorized_keys file locally Fortunately, upload to these 3 machines.
The content of my /root/.ssh/id_rsa.pub on the hserver1 machine is:
[ plain]view plaincopy
The content of my /root/.ssh/id_rsa.pub on the hserver2 machine is:
[ plain]view plaincopy
The content of my /root/.ssh/id_rsa.pub on the hserver2 machine is:
[ plain]view plaincopy
After merging, the content of the /root/.ssh/authorized_keys file on my hserver1 machine is:
[ plain]view plaincopy
As shown:
The authorized_keys file is already in the /root/.ssh/ directory of the hserver1 machine, and the content of the file is already OK. Next, copy the file to /root/.ssh/ of hserver2 and /root/ of hserver3. ssh/.
There are many ways to copy, the easiest one is to use the SecureFX visualization tool.
After the copy is complete, you can see that there are such files in the /root/.ssh directory of the three machines
As shown:
The above figure has made it very clear. The /root/.ssh of the three machines have files with the same name, but only the contents of the authorized_keys file are the same.
input the command:
ssh hserver2
As shown:
input the command:
exit enter
As shown:
input the command:
ssh hserver3
As shown:
input the command:
exit enter
As shown:
The method is similar to 2.7.1, except that the commands have become ssh hserver1 and ssh hserver3, but it must be noted that after each ssh is completed, exit must be executed, otherwise your subsequent commands will be executed on another machine .
The method is similar to 2.7.1, except that the commands have become ssh hserver1 and ssh hserver2, but it must be noted that after each ssh is completed, exit must be executed, otherwise your subsequent commands will be executed on another machine .
Explain that in order to save a series of tedious operations such as obtaining administrator rights, authorization, etc., and to streamline the tutorial, here are all logins with root accounts and operations with root privileges.
The installation of jdk is not detailed here. If necessary, you can refer to the blog post (although the blog post uses ubuntu, the jdk installation is the same under CentOS):
http://blog.csdn.net/pucao_cug/article/details/68948639
Note: The following steps need to be repeated on all 3 machines.
Create a new directory named hadoop in the opt directory, and upload the downloaded hadoop-2.8.0.tar to this directory, as shown in the figure:
Enter the directory and execute the command:
cd /opt/hadoop
Execute the decompression command:
tar -xvf hadoop-2.8.0.tar.gz
Note: All three machines must perform the above operations, and after decompression, a directory named hadoop-2.8.0 will be obtained.
Create a few directories in the /root directory, copy and paste and execute the following commands:
[ plain]view plaincopy
Modify a series of files in the /opt/hadoop/hadoop-2.8.0/etc/hadoop directory.
Modify the /opt/hadoop/hadoop-2.8.0/etc/hadoop/core-site.xml file
in
< /configuration>
Modify the /opt/hadoop/hadoop-2.8.0/etc/hadoop/hadoop-env.sh file
Export JAVA_HOME=${JAVA_HOME}
change into:
export JAVA_HOME=/opt/java/jdk1.8.0_121
Description: Modify to your own JDK path
Modify the /opt/hadoop/hadoop-2.8.0/etc/hadoop/hdfs-site.xml file
in
< property>
< /property>
< property>
< /property>
< property>
< /property>
< property>
< /property>
Note: After dfs.permissions is configured as false, you can allow files on dfs to be generated without checking permissions, which is convenient, but you need to prevent accidental deletion, please set it to true, or delete the property node directly, because The default is true.
In this version, there is a file named mapred-site.xml.template. Copy the file and rename it to mapred-site.xml. The command is:
[ plain]view plaincopy
Modify this newly created mapred-site.xml file, in
< /property>
< property>
< /property>
< property>
< /property>
Modify the /opt/hadoop/hadoop-2.8.0/etc/hadoop/slaves file, delete the localhost inside, and add the following content:
[ plain]view plaincopy
Modify the /opt/hadoop/hadoop-2.8.0/etc/hadoop/yarn-site.xml file,
in
< property>
< /property>
< /property>
Note: yarn.nodemanager.vmem-check-enabled means ignoring the check of virtual memory. If you are installing on a virtual machine, this configuration is very useful, and it is not easy to cause problems with subsequent operations. If it is on a physical machine and there is enough memory, this configuration can be removed.
Because hserver1 is a namenode, hserver2 and hserver3 are both datanodes, so only hserver1 needs to be initialized, that is, hdfs is formatted.
Enter the /opt/hadoop/hadoop-2.8.0/bin directory of the hserver1 machine, that is, execute the command:
cd /opt/hadoop/hadoop-2.8.0/bin
Execute the initialization script, that is, execute the command:
./hadoop namenode -format
As shown:
Wait a few seconds, if no error is reported, the execution can be successful, as shown in the figure:
After the format is successful, you can see that there is an additional current directory in the /root/hadoop/dfs/name/ directory, and there are a series of files in the directory
As shown:
Because hserver1 is a namenode, hserver2 and hserver3 are both datanodes, so you only need to execute the startup command on hserver1.
Enter the /opt/hadoop/hadoop-2.8.0/sbin directory of the hserver1 machine, that is, execute the command:
cd /opt/hadoop/hadoop-2.8.0/sbin
Execute the initialization script, that is, execute the command:
./start-all.sh
The first time we execute the above startup command, we will need to perform interactive operations. Enter yes on the Q&A interface and press Enter
As shown:
Hadoop is started, you need to test whether Hadoop is normal.
Execute the command to turn off the firewall. Under CentOS7, the command is:
systemctl stop firewalld.service
As shown:
hserver1 is our namanode, the IP of this machine is 192.168.119.128, visit the following address on the local computer:
Automatically jump to the overview page
As shown:
Visit the following address in the local browser:
Automatically jump to the cluster page
As shown:
Recommended Posts