Detailed explanation of Spark installation and configuration tutorial under centOS7

**Environmental description: **

Operating system: centos7 64 bit 3 units
centos7-1 192.168.190.130 master
centos7-2 192.168.190.129 slave1
centos7-3 192.168.190.131 slave2

To install spark, you need to install the following at the same time:

jdk scale

Install jdk, configure jdk environment variables

I won’t talk about how to install and configure jdk, Baidu by myself.

Install scala

Download the scala installation package, https://www.scala-lang.org/download/ select the version that meets the requirements to download, and upload it to the server using the client tool. Unzip:

 # tar -zxvf scala-2.13.0-M4.tgz
 Modify again/etc/profile file, add the following content:
 export SCALA_HOME=$WORK_SPACE/scala-2.13.0-M4
 export PATH=$PATH:$SCALA_HOME/bin
 # source /etc/profile   //Make it effective immediately
 # scala -version      //Check whether scala is installed

** 3. Install spark**

Spark download address: http://spark.apache.org/downloads.html

Note: There are different version packages to download, just choose the download and install you need

Source code: Spark source code, you need to compile to use it, and Scala 2.11 need to use source code to compile it to use
Pre-build with user-provided Hadoop: "Hadoop free" version, applicable to any Hadoop version
Pre-build for Hadoop 2.7 and later: A pre-build version based on Hadoop 2.7, which needs to correspond to the Hadoop version installed on this machine. Hadoop 2.6 is also optional. Because the hadoop installed here is 3.1.0, so I installed for hadoop 2.7 and later version directly.

Note: Please check my previous blog for the installation of hadoop, and the description will not be repeated.

Spark installation and configuration under centOS7
# mkdir spark 
# cd /usr/spark
# tar -zxvf spark-2.3.1-bin-hadoop2.7.tgz
# vim /etc/profile
# Add spark environment variables, add them under PATH and export them
# source /etc/profile
# Enter the conf directory and put spark-env.sh.A copy of template and renamed spark-env.sh
# cd /usr/spark/spark-2.3.1-bin-hadoop2.7/conf
# cp spark-env.sh.template spark-env.sh
# vim spark-env.sh
export SCALA_HOME=/usr/scala/scala-2.13.0-M4
export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk-1.8.0.171-8.b10.el7_5.x86_64
export HADOOP_HOME=/usr/hadoop/hadoop-3.1.0export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=/usr/spark/spark-2.3.1-bin-hadoop2.7export SPARK_MASTER_IP=master
export SPARK_EXECUTOR_MEMORY=1G
# Enter the conf directory and put slaves.Copy a copy of template and rename it to slaves
# cd /usr/spark/spark-2.3.1-bin-hadoop2.7/conf
# cp slaves.template slaves
# vim slaves
# Add the node domain name to the slaves file
# master   //The domain name is centos7-1 domain name
# slave1   //The domain name is centos7-2 domain names
# slave2   //The domain name is centos7-3 domain names

Start spark

# Start up the hadoop node before starting spark
# cd /usr/hadoop/hadoop-3.1.0/
# sbin/start-all.sh
# jps //Check whether the started thread has started hadoop
# cd /usr/spark/spark-2.3.1-bin-hadoop2.7
# sbin/start-all.sh
Remarks: in slave1\On the slave2 node, spark must also be installed in the above way, or a copy directly to slave1,slave2 node
# scp -r /usr/spark root@slave1ip:/usr/spark

The startup information is as follows:

starting org.apache.spark.deploy.master.Master, logging to /usr/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-master.out
slave2: starting org.apache.spark.deploy.worker.Worker, logging to /usr/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave2.com.cn.out
slave1: starting org.apache.spark.deploy.worker.Worker, logging to /usr/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave1.com.cn.out
master: starting org.apache.spark.deploy.worker.Worker, logging to /usr/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-master.out

Test the Spark cluster:

Use a browser to open the spark cluster url on the master node: http://192.168.190.130:8080/

to sum up

The above is a detailed explanation of the Spark installation and configuration tutorial under centOS7 introduced by the editor. I hope it will be helpful to you. If you have any questions, please leave a message to me. The editor will reply to you in time!