Apache Spark Installation On Ubuntu- A Beginners Tutorial 5


1. Objective

This tutorial describes the first step while learning Apache Spark i.e. Apache Spark Installation On Ubuntu. This Apache Spark tutorial is a step by step guide for Installation of Spark, the configuration of pre-requisites and launches Spark shell to perform various operations. If you are completely new to Apache Spark, I would recommend you to read these introductory blogs- What is Spark, Spark ecosystem, Spark key abstraction RDD, Spark features, and limitations of Apache Spark.

Apache Spark Installation On Ubuntu

2. Steps for Apache Spark Installation On Ubuntu

Follow the steps given below for Apache Spark Installation On Ubuntu-

2.1. Deployment Platform

i. Platform Requirements

  • Operating System: You can use Ubuntu 14.04 or later (other Linux flavors can also be used like CentOS, Redhat, etc.)
  • Spark: Apache Spark 1.6.1 or later

ii. Setup Platform

If you are using Windows / Mac OS you can create a virtual machine and install Ubuntu using VMWare Player, alternatively, you can create a virtual machine and install Ubuntu using Oracle Virtual Box.

2.2. Prerequisites

i. Install Java 7

a. Install Python Software Properties

$sudo apt-get install python-software-properties

b. Add Repository

$sudo add-apt-repository ppa:webupd8team/java

c. Update the source list

$sudo apt-get update

d. Install Java

$sudo apt-get install oracle-java7-installer

2.3. Install Apache Spark

i. Download Spark

You can download Apache Spark from the below link. In the package type please select “Pre-built for Hadoop 2.6 and Later”
http://spark.apache.org/downloads.html

Or, you can use direct download link:
http://mirror.fibergrid.in/apache/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.6.tgz

ii. Untar Spark Setup

$tar xzf spark-1.6.1-bin-hadoop2.6.tgz

You can find all the scripts and configuration files in newly created directory “spark-1.6.1-bin-hadoop2.6”

iii. Setup Configuration

a. Edit .bashrc

Edit .bashrc file located in user’s home directory and add following parameters-

export JAVA_HOME=<path-to-the-root-of-your-Java-installation> (eg: /usr/lib/jvm/java-7-oracle/)
export SPARK_HOME=<path-to-the-root-of-your-spark-installation> (eg: /home/dataflair/spark-1.6.1-bin-hadoop2.6/)

2.4. Launch the Spark Shell

Go to Spark home directory (spark-1.6.1-bin-hadoop2.6) and run below command to start Spark Shell

$bin/spark-shell.sh

Spark shell is launched, now you can play with Spark

i. Spark UI

This is the GUI for Spark Application, in local mode spark shell runs as an application. The GUI provide details about stages, storage (cached RDDs), Environment Variables and executors

http://localhost:4040

2.5. Spark Commands / Operations

Once you installed Apache Spark, you can play with spark shell to perform the various operation like transformation and action, the creation of RDDs. Follow this guide for Shell Commands to working with Spark.

 


Leave a comment

Your email address will not be published. Required fields are marked *

5 thoughts on “Apache Spark Installation On Ubuntu- A Beginners Tutorial