Spark Installation in Standalone Mode | Install Apache Spark

Boost your career with Data Engineering Courses!!

1. Objective – Apache Spark Installation

This tutorial contains steps for Apache Spark Installation in Standalone Mode on Ubuntu. The Spark standalone mode sets the system without any existing cluster management software. For example Yarn Resource Manager / Mesos. We have spark master and spark worker who divides driver and executors for Spark application in Standalone mode.

So, let’s start Spark Installation in Standalone Mode.

Spark Installation in Standalone Mode | Install Apache Spark

2. Steps to Apache Spark Installation in Standalone Mode

Let’s Follow the steps given below for Apache Spark Installation in Standalone Mode-

i. Platform

a. Platform Requirements

Operating system: Ubuntu 14.04 or later, we can also use other Linux flavors like CentOS, Redhat, etc.

b. Configure & Setup Platform

If you are using Windows / Mac Operating System, so, you can create a virtual machine and install Ubuntu using VMWare Player, or you can create a virtual machine and install Ubuntu using Oracle Virtual Box.

ii. Software you need to install before installing Spark

a. Install Java

You need to install Java before Spark installation. So, let’s begin by installing Java. So, use the below command to download and install Java-
[php]$ sudo apt-get install python-software-properties[/php]
[php]$ sudo apt-add-repository ppa:webupd8team/java[/php]
[php]$ sudo apt-get update[/php]
[php]$ sudo apt-get install oracle-java7-installer[/php]
On executing this command Java gets start downloading and gets installed.
To check whether installation procedure gets completed and a completely working Java is installed or not and to know the version of Java installed we have to use the below command-
[php] $ java -version[/php]

b. Installing Scala

Download Scala

Download the latest version of Scala from http://www.scala-lang.org/
Apache Spark is written in Scala, so we need to install Scala to built Spark. Follow the steps given below for installing Scala.

Untar the file

[php] $ sudo tar xvf scala-2.10.4.tgz [/php]

Edit Bashrc file

Make an entry for Scala in .bashrc file
[php] nano ~/.bashrc[/php]
And add the following path at the end of the file. It means adding the location, where the Scala software file are located to the PATH variable.
[php] export SCALA_HOME=Path-where-scala-file-is-located[/php]
[php] export PATH=$PATH:$SCALA_HOME/bin[/php]
Source the changed .bashrc file by the command
[php] source ~/.bashrc[/php]

Verifying Scala Installation

After installation, it is good to verify it. Use the following command for verifying Scala installation.
[php] $scala -version[/php]

iii. Installing Spark

Install Spark in standalone mode on a Single node cluster – for Apache Spark Installation in Standalone Mode, simply place Spark setup on the node of the cluster and extract and configure it. Follow this guide If you are planning to install Spark on a multi-node cluster.

a. Download Spark

Download the latest version of Spark from http://spark.apache.org/downloads.html of your choice from the Apache Spark website.
Follow the steps given below for installing Spark.

b. Extracting Spark tar

Use the following command for extracting the spark tar file.
[php] $ tar xvf spark-2.0.0-bin-hadoop2.6.tgz[/php]

c. Setting up the environment for Spark

Make an entry for Spark in .bashrc file
[php]nano ~/.bashrc[/php]
Add the following line to the ~/.bashrc file. It means adding the location, where the spark software files are located to the PATH variable.
[php] export SPARK_HOME=/home/sapna/spark-2.0.0-bin-hadoop2.6/
export PATH=$PATH:$SPARK_HOME/bin[/php]
Use the following command for sourcing the ~/.bashrc file.
[php] $ source ~/.bashrc[/php]

iv. Start Spark Services

a. Starting a Cluster Manually

Now, start a standalone master server by executing-
[php]./sbin/start-master.sh [/php]

Spark Installation – Starting a standalone master server

Spark Installation – Start a standalone master server

Apache Spark Installation in Standalone Mode

After running, the master will print out a spark://HOST:PORT URL for itself,
which can be used to connect workers to it, or pass as the “master” argument to SparkContext.
You will get this URL on the master’s web UI, which is http://localhost:8080
Preview of web console of master using local http://localhost:8080

Spark Installation – Preview of web console of master

Similarly, you can start one or more workers and connect them to the master via-

./sbin/start-slave.sh <master-spark-URL>

[php]./sbin/start-slave.sh spark://sapna-All-Series:7077[/php]

Spark Installation – start one or more workers and connect them to the master

Spark Installation

Note: You can copy master-spark-Url from master web console (http://localhost:8080)

b. Check whether Spark daemons are working

[php]jps
7698 Master
4582 Worker
[/php]

Spark Installation – Check whether the Spark daemons are working or not

v. Running sample Spark application

Once you have done Apache Spark Installation in Standalone Mode Let’s run Apache Spark Pi example (the jar for the example is shipped with Spark)
[php]./bin/spark-submit –class org.apache.spark.examples.SparkPi –master spark://sapna-All-Series:7077 –executor-memory 1G –total-executor-cores 1 /home/sapna/spark-2.0.0-bin-hadoop2.6/examples/jars/spark-examples_2.11-2.0.0.jar 10[/php]
–class: The entry point for your application.
–master: The master URL for the cluster.
–executor-memory: Specify memory to be allocated for the application.
–total-executor-cores: Specify no. of CPU cores to be allocated for the application.

Spark Installation – Apache Spark Pi example

Result:
Pi is roughly 3.141803141803142

vi. Starting the Spark Shell

[php]$ bin/spark-shell spark://sapna-All-Series:7077[/php]

Spark Installation – Starting the Spark Shell

Now to play with Spark, firstly create RDD and perform various RDD operations using this Spark shell commands tutorial.
See Also-

Reference:
http://spark.apache.org/

Your 15 seconds will encourage us to work even harder
Please share your happy experience on Google

Tags: apache spark deploy spark install spark learn spark spark & Scala Spark API spark course spark install ubuntu Spark Installation spark setup spark ubuntu Standalone mode spark installation

DataFlair Team

The DataFlair Team provides industry-driven content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our expert educators focus on delivering value-packed, easy-to-follow resources for tech enthusiasts and professionals.

Eva says:
December 12, 2016 at 8:54 pm
Nicely explained how to install Spark2.x on single node.
Reply
Eric says:
December 12, 2016 at 11:30 pm
Nice description.
Reply
Smith says:
December 14, 2016 at 11:03 pm
I installed Spark on my system successfully with your blog. Thanks.
Please let me know few examples for practicing in Spark.
Reply
Darwin says:
July 2, 2017 at 7:30 am
In the section for Scala installation, this adding path command was written
export PATH=$SCALA_HOME/bin:$PATH
It would remove the PATH, Please make it
export PATH=$PATH:$SCALA_HOME/bin
Reply
adi says:
August 2, 2017 at 8:47 am
very nicely explained.
Reply
Sumit Kumar says:
December 16, 2017 at 1:50 pm
I AM GETTING THIS ERROR CAN ANYBODY HELP ME
PLZZZZZZ AFTER RUNNING ./sbin/start-master.sh This command
sumit@sumit-HP-Pavilion-15-Notebook-PC:~/spark-2.2.1-bin-hadoop2.7$ ./sbin/start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /home/sumit/spark-2.2.1-bin-hadoop2.7/logs/spark-sumit-org.apache.spark.deploy.master.Master-1-sumit-HP-Pavilion-15-Notebook-PC.out
failed to launch: nice -n 0 /home/sumit/spark-2.2.1-bin-hadoop2.7/bin/spark-class org.apache.spark.deploy.master.Master –host sumit-HP-Pavilion-15-Notebook-PC –port 7077 –webui-port 8080
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)
full log in /home/sumit/spark-2.2.1-bin-hadoop2.7/logs/spark-sumit-org.apache.spark.deploy.master.Master-1-sumit-HP-Pavilion-15-Notebook-PC.out
Reply
Krishna Vamsi says:
October 11, 2019 at 6:49 pm
I am getting an error:-cd $SPARK-HOME
bash: cd: -H: invalid option
cd: usage: cd [-L|[-P [-e]] [-@]] [dir]
Reply

Spark Installation in Standalone Mode | Install Apache Spark

1. Objective – Apache Spark Installation

2. Steps to Apache Spark Installation in Standalone Mode

i. Platform

a. Platform Requirements

b. Configure & Setup Platform

ii. Software you need to install before installing Spark

a. Install Java

b. Installing Scala

iii. Installing Spark

a. Download Spark

b. Extracting Spark tar

c. Setting up the environment for Spark

iv. Start Spark Services

a. Starting a Cluster Manually

b. Check whether Spark daemons are working

v. Running sample Spark application

vi. Starting the Spark Shell

7 Responses

Leave a Reply Cancel reply

About DataFlair

Trending Courses

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Data Science Tutorials

Trending Projects

Trending Programming Tutorials

Trending Tutorials