Spark Installation in Standalone Mode | Install Apache Spark
1. Objective – Apache Spark Installation
This tutorial contains steps for Apache Spark Installation in Standalone Mode on Ubuntu. The Spark standalone mode sets the system without any existing cluster management software. For example Yarn Resource Manager / Mesos. We have spark master and spark worker who divides driver and executors for Spark application in Standalone mode.
So, let’s start Spark Installation in Standalone Mode.
2. Steps to Apache Spark Installation in Standalone Mode
Let’s Follow the steps given below for Apache Spark Installation in Standalone Mode-
a. Platform Requirements
Operating system: Ubuntu 14.04 or later, we can also use other Linux flavors like CentOS, Redhat, etc.
b. Configure & Setup Platform
If you are using Windows / Mac Operating System, so, you can create a virtual machine and install Ubuntu using VMWare Player, or you can create a virtual machine and install Ubuntu using Oracle Virtual Box.
ii. Software you need to install before installing Spark
a. Install Java
You need to install Java before Spark installation. So, let’s begin by installing Java. So, use the below command to download and install Java-
$ sudo apt-get install python-software-properties
$ sudo apt-add-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java7-installer
On executing this command Java gets start downloading and gets installed.
To check whether installation procedure gets completed and a completely working Java is installed or not and to know the version of Java installed we have to use the below command-
$ java -version
b. Installing Scala
Untar the file
$ sudo tar xvf scala-2.10.4.tgz
Edit Bashrc file
Make an entry for Scala in .bashrc file
And add the following path at the end of the file. It means adding the location, where the Scala software file are located to the PATH variable.
Source the changed .bashrc file by the command
Verifying Scala Installation
After installation, it is good to verify it. Use the following command for verifying Scala installation.
iii. Installing Spark
Install Spark in standalone mode on a Single node cluster – for Apache Spark Installation in Standalone Mode, simply place Spark setup on the node of the cluster and extract and configure it. Follow this guide If you are planning to install Spark on a multi-node cluster.
a. Download Spark
Download the latest version of Spark from http://spark.apache.org/downloads.html of your choice from the Apache Spark website.
Follow the steps given below for installing Spark.
b. Extracting Spark tar
Use the following command for extracting the spark tar file.
$ tar xvf spark-2.0.0-bin-hadoop2.6.tgz
c. Setting up the environment for Spark
Make an entry for Spark in .bashrc file
Add the following line to the ~/.bashrc file. It means adding the location, where the spark software files are located to the PATH variable.
export SPARK_HOME=/home/sapna/spark-2.0.0-bin-hadoop2.6/ export PATH=$PATH:$SPARK_HOME/bin
Use the following command for sourcing the ~/.bashrc file.
$ source ~/.bashrc
iv. Start Spark Services
a. Starting a Cluster Manually
Now, start a standalone master server by executing-
After running, the master will print out a spark://HOST:PORT URL for itself,
which can be used to connect workers to it, or pass as the “master” argument to SparkContext.
You will get this URL on the master’s web UI, which is http://localhost:8080
Preview of web console of master using local http://localhost:8080
Similarly, you can start one or more workers and connect them to the master via-
Note: You can copy master-spark-Url from master web console (http://localhost:8080)
b. Check whether Spark daemons are working
jps 7698 Master 4582 Worker
v. Running sample Spark application
Once you have done Apache Spark Installation in Standalone Mode Let’s run Apache Spark Pi example (the jar for the example is shipped with Spark)
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://sapna-All-Series:7077 --executor-memory 1G --total-executor-cores 1 /home/sapna/spark-2.0.0-bin-hadoop2.6/examples/jars/spark-examples_2.11-2.0.0.jar 10
–class: The entry point for your application.
–master: The master URL for the cluster.
–executor-memory: Specify memory to be allocated for the application.
–total-executor-cores: Specify no. of CPU cores to be allocated for the application.
Pi is roughly 3.141803141803142
vi. Starting the Spark Shell
$ bin/spark-shell spark://sapna-All-Series:7077