Install Spark On Ubuntu- A Beginners Tutorial for Apache Spark

Boost your career with Data Engineering Courses!!

1. Objective – Install Spark

This tutorial describes the first step while learning Apache Spark i.e. install Spark on Ubuntu. This Apache Spark tutorial is a step by step guide for Installation of Spark, the configuration of pre-requisites and launches Spark shell to perform various operations. If you are completely new to Apache Spark, I would recommend you to read these introductory blogs- What is Spark, Spark ecosystem, Spark key abstraction RDD, Spark features, and limitations of Apache Spark.

2. Steps for Apache Spark Installation On Ubuntu

Follow the steps given below for Apache Spark Installation On Ubuntu-

i. Deployment Platform

a. Platform Requirements

Operating System: You can use Ubuntu 14.04 or later (other Linux flavors can also be used like CentOS, Redhat, etc.)
Spark: Apache Spark 1.6.1 or later

b. Setup Platform

If you are using Windows / Mac OS you can create a virtual machine and install Ubuntu using VMWare Player, alternatively, you can create a virtual machine and install Ubuntu using Oracle Virtual Box.

ii. Prerequisites

a. Install Java 7

Install Python Software Properties

[php]$sudo apt-get install python-software-properties[/php]

Add Repository

[php]$sudo add-apt-repository ppa:webupd8team/java[/php]

Update the source list

[php]$sudo apt-get update[/php]

Install Java

[php]$sudo apt-get install oracle-java7-installer[/php]

iii. Install Apache Spark

a. Download Spark

You can download Apache Spark from the below link. In the package type please select “Pre-built for Hadoop 2.6 and Later”

http://spark.apache.org/downloads.html

Or, you can use direct download link:

http://mirror.fibergrid.in/apache/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.6.tgz

b. Untar Spark Setup

[php]$tar xzf spark-1.6.1-bin-hadoop2.6.tgz[/php]

You can find all the scripts and configuration files in the newly created directory “spark-1.6.1-bin-hadoop2.6”

c. Setup Configuration

Edit .bashrc

Edit .bashrc file located in the user’s home directory and add following parameters-

[php]export JAVA_HOME=<path-to-the-root-of-your-Java-installation> (eg: /usr/lib/jvm/java-7-oracle/)

export SPARK_HOME=<path-to-the-root-of-your-spark-installation> (eg: /home/dataflair/spark-1.6.1-bin-hadoop2.6/)[/php]

iv. Launch the Spark Shell

Go to Spark home directory (spark-1.6.1-bin-hadoop2.6) and run below command to start Spark Shell

[php]$bin/spark-shell.sh[/php]

Spark shell is launched, now you can play with Spark

a. Spark UI

This is the GUI for Spark Application, in local mode spark shell runs as an application. The GUI provide details about stages, storage (cached RDDs), Environment Variables and executors

[php]http://localhost:4040[/php]

v. Spark Commands / Operations

Once you installed Apache Spark, you can play with spark shell to perform the various operation like transformation and action, the creation of RDDs. Follow this guide for Shell Commands to working with Spark.

So, this was all in the tutorial on how to install Spark in Ubuntu. hope you understand the complete process.

3. Conclusion – Spark installation

Hence, in this Spark installation tutorial, we discussed the steps to install Spark on Ubuntu. Still, if you are facing any problem, feel free to ask in the comment tab.

Reference for Spark

Did you like this article? If Yes, please give DataFlair 5 Stars on Google

Tags: apache spark big data install spark Install Spark on Ubuntu Spark spark & Scala spark training spark tutorial

DataFlair Team

The DataFlair Team provides industry-driven content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our expert educators focus on delivering value-packed, easy-to-follow resources for tech enthusiasts and professionals.

Ukavex says:
November 9, 2016 at 7:51 am
Nice material..
Reply
- DataFlair Team says:
  November 23, 2018 at 5:17 pm
  Hi Ukavex,
  Glad you like our content on Spark installation. We recommend you to try more our blogs, surely you will grab a good experience.
  Keep learning.
  Reply
Disha says:
November 9, 2016 at 8:06 am
Nice article..
Reply
- DataFlair Team says:
  November 23, 2018 at 5:14 pm
  Thank you Disha for taking time and giving us a reply on Spark Installation tutorial.
  Try our latest articles on Spark. You will love them
  Regards
  DataFlair Team
  Reply
Matthias says:
November 16, 2016 at 7:33 am
I really like your blog explaining how to install Apache Spark on Ubuntu. Would request you to explain how to develop Spark project as well.
Thanks in advance.
Reply
- DataFlair Team says:
  November 23, 2018 at 5:12 pm
  Hellow Matthias
  Glad to read such a fab comment on Spark Installation. We knew that you will need help for creating Spark Project. So we have published a blog to Create the Spark Project. You can take help from this.
  And if you need any more help you can tell us freely.
  Best Wishes from DataFlair
  Reply
Rosaline says:
January 5, 2017 at 9:23 am
I must say you have high quality articles and how to install apache spark is just awesum explanation.
Reply
- DataFlair Team says:
  November 23, 2018 at 5:01 pm
  Hi Rosaline,
  Thanks for giving us a nice feedback. Readers like are the motivation for us. Hope now you got the complete Spark installation process. I recommend you to learn more about Spark through our latest blogs. You can start your learning with Spark Shell Commands tutorial.
  All the best.
  Reply
Avi says:
March 17, 2017 at 1:53 pm
Can you suggest some good books on Spark?
Reply
- DataFlair Team says:
  November 23, 2018 at 5:04 pm
  Hi Avi,
  We have already published a blog on Spark books. You can refer this. The Spark Books article, it contains a good collection of books with their details that will help you.
  Happy learning.
  Reply
Geronimo says:
November 5, 2018 at 2:31 pm
sudo apt-get install openjdk-7-jdk
instead of :
sudo apt-get install oracle-java7-installer
Works fine for ubuntu 14 and 16
Reply
Cedric says:
April 27, 2019 at 9:44 am
Guys , great site but please update the installation guidance as there has been changes to the licensing of OracleJDK .For Ubuntu
sudo apt-key adv –keyserver hkp://keyserver.ubuntu.com:80 –recv-keys 0xB1998361219BD9C9
sudo apt-add-repository ‘deb http://repos.azulsystems.com/ubuntu stable main’
sudo apt-key adv –keyserver hkp://keyserver.ubuntu.com:80 –recv-keys 0xB1998361219BD9C9
sudo apt-add-repository ‘deb http://repos.azulsystems.com/ubuntu stable main’
##OPENJDK 7
sudo apt install zulu-7
##OPENJDK 8
sudo apt install zulu-8
##OPENJDK 11
sudo apt install zulu-11
##OPENJDK 12
sudo apt install zulu-12
Reply

Install Spark On Ubuntu- A Beginners Tutorial for Apache Spark

1. Objective – Install Spark

2. Steps for Apache Spark Installation On Ubuntu

i. Deployment Platform

ii. Prerequisites

iii. Install Apache Spark

iv. Launch the Spark Shell

v. Spark Commands / Operations

3. Conclusion – Spark installation

12 Responses

Leave a Reply Cancel reply

About DataFlair

Trending Courses

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Data Science Tutorials

Trending Projects

Trending Programming Tutorials

Trending Tutorials