Apache Spark Installation in Standalone Mode 5


1. Objective

This tutorial contains steps for Apache Spark Installation in Standalone Mode on Ubuntu. The Spark standalone mode sets the system without any existing cluster management software. For example Yarn Resource Manager / Mesos. We have spark master and spark worker who divide driver and executors for Spark application in Standalone mode.

Learn Apache Spark Installation in Standalone Mode from Spark Experts

2. Steps to Apache Spark Installation in Standalone Mode

Let’s Follow the steps given below for Apache Spark Installation in Standalone Mode-

2.1. Platform

I. Platform Requirements

Operating system: Ubuntu 14.04 or later, we can also use other Linux flavors like CentOS, Redhat, etc.

II. Configure & Setup Platform

If you are using Windows / Mac Operating System, so, you can create a virtual machine and install Ubuntu using VMWare Player, or you can create a virtual machine and install Ubuntu using Oracle Virtual Box.

2.2. Software you need to install before installing Spark

I. Install Java

You need to install Java before Spark installation. So, let’s begin by installing Java. So, use the below command to download and install Java-

$ sudo apt-get install python-software-properties
$ sudo apt-add-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java7-installer

On executing this command Java gets start downloading and gets installed.

To check whether installation procedure gets completed and a completely working Java is installed or not and to know the version of Java installed we have to use the below command-

 $ java -version

II. Installing Scala

a. Download Scala

Download the latest version of Scala from http://www.scala-lang.org/

Apache Spark is written in Scala, so we need to install Scala to built Spark. Follow the steps given below for installing Scala.

b. Untar the file
 $ sudo tar xvf scala-2.10.4.tgz 
c. Edit Bashrc file

Make an entry for Scala in .bashrc file

 nano ~/.bashrc

And add the following path at the end of the file. It means adding the location, where the Scala software file are located to the PATH variable.

 export SCALA_HOME=Path-where-scala-file-is-located
 export PATH=$PATH:$SCALA_HOME/bin

Source the changed .bashrc file by the command

 source ~/.bashrc
d. Verifying Scala Installation

After installation, it is good to verify it. Use the following command for verifying Scala installation.

 $scala -version

2.3. Installing Spark

Install Spark in standalone mode on a Single node cluster – for Apache Spark Installation in Standalone Mode, simply place Spark setup on the node of the cluster and extract and configure it. Follow this guide If you are planning to install Spark on a multi-node cluster.

I. Download Spark

Download the latest version of Spark from http://spark.apache.org/downloads.html of your choice from the Apache Spark website.

Follow the steps given below for installing Spark.

II. Extracting Spark tar

Use the following command for extracting the spark tar file.

 $ tar xvf spark-2.0.0-bin-hadoop2.6.tgz

III. Setting up the environment for Spark

Make an entry for Spark in .bashrc file

nano ~/.bashrc

Add the following line to the ~/.bashrc file. It means adding the location, where the spark software files are located to the PATH variable.

 export SPARK_HOME=/home/sapna/spark-2.0.0-bin-hadoop2.6/
export PATH=$PATH:$SPARK_HOME/bin

Use the following command for sourcing the ~/.bashrc file.

 $ source ~/.bashrc

 2.4. Start Spark Services

I. Starting a Cluster Manually

Now, start a standalone master server by executing-

./sbin/start-master.sh 

Starting a standalone master server

Start a standalone master server

Apache Spark Installation in Standalone Mode

After running, the master will print out a spark://HOST:PORT URL for itself,
which can be used to connect workers to it, or pass as the “master” argument to SparkContext.

You will get this URL on the master’s web UI, which is http://localhost:8080
Preview of web console of master using local http://localhost:8080

Preview of web console of master

Similarly, you can start one or more workers and connect them to the master via-

./sbin/start-slave.sh <master-spark-URL>
./sbin/start-slave.sh spark://sapna-All-Series:7077

start one or more workers and connect them to the master
Apache Spark Installation in Standalone Mode - Start the workers

Note: You can copy master-spark-Url from master web console (http://localhost:8080)

II. Check whether Spark daemons are working

jps
7698 Master
4582 Worker

Check whether the Spark daemons are working or not

2.5. Running sample Spark application

Once you have done Apache Spark Installation in Standalone Mode Let’s run Apache Spark Pi example (the jar for the example is shipped with Spark)

./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://sapna-All-Series:7077 --executor-memory 1G --total-executor-cores 1 /home/sapna/spark-2.0.0-bin-hadoop2.6/examples/jars/spark-examples_2.11-2.0.0.jar 10

–class: The entry point for your application.

–master: The master URL for the cluster.

–executor-memory: Specify memory to be allocated for the application.

–total-executor-cores: Specify no. of CPU cores to be allocated for the application.

Apache Spark Pi example

Result:

Pi is roughly 3.141803141803142

2.6. Starting the Spark Shell

$ bin/spark-shell spark://sapna-All-Series:7077

 Starting the Spark Shell

Now to play with Spark, firstly create RDD and perform various RDD operations using this Spark shell commands tutorial.

See Also-

Reference:

http://spark.apache.org/

 


Leave a comment

Your email address will not be published. Required fields are marked *

5 thoughts on “Apache Spark Installation in Standalone Mode