Site icon DataFlair

Create Spark Project in Scala With Eclipse Without Maven

Create Spark project in Scala with Eclipse without Maven

Create Spark project in Scala with Eclipse without Maven

1. Objective – Spark Scala Project

This step by step tutorial will explain how to create a Spark project in Scala with Eclipse without Maven and how to submit the application after the creation of jar. This Guide also briefs about the installation of Scala plugin in eclipse and setup spark environment in eclipse. Learn how to configure development environment for developing Spark applications in Scala in this tutorial.
If you are completely new to Apache Spark, I recommend you to read this Apache Spark Introduction Guide.

Create Spark project in Scala with Eclipse without Maven

2. Steps to Create the Spark Project in Scala

To create Spark Project in Scala with Eclipse without Maven follow the steps given below-

i. Platform Used / Required

ii. Install Eclipse plugin for Scala

Open Eclipse Marketplace (Help >> Eclipse Marketplace) and search for “scala ide”. Now install the Scala IDE. Alternatively, you can download Eclipse for Scala.

Install Eclipse plugin for Scala

iii. Create a New Spark Scala Project

To create a new Spark Scala project, click on File >> New >> Other

Create a New Spark Scala Project

Select Scala Project:

Select Scala Project

Supply Project Name:

Supply Project Name

iv. Create New Package

After creating the project, now create a new package.

Create New Package

Supply Package Name:

Supply Package Name

v. Create a New Scala Object

Now create a new Scala Object to develop Scala program for Spark application

Create a new Scala Object to develop Scala program for Spark application

Select Scala Object:

Select Scala Object

Supply Object Name:

Supply Object Name:

vi. New Scala Object in Editor

Scala object is ready now we can develop our Spark wordcount code in Scala-

New Scala Object in Editor to create Spark Application

vii. Copy below Spark Scala Wordcount Code in Editor

[php]
package com.dataflair.spark
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
object Wordcount {
def main(args: Array[String]) {
//Create conf object
val conf = new SparkConf()
.setAppName(“WordCount”)
//create spark context object
val sc = new SparkContext(conf)
//Check whether sufficient params are supplied
if (args.length < 2) {
println(“Usage: ScalaWordCount <input> <output>”)
System.exit(1)
}
//Read file and create RDD
val rawData = sc.textFile(args(0))
//convert the lines into words using flatMap operation
val words = rawData.flatMap(line => line.split(” “))
//count the individual words using map and reduceByKey operation
val wordCount = words.map(word => (word, 1)).reduceByKey(_ + _)
//Save the result
wordCount.saveAsTextFile(args(1))
//stop the spark context
sc.stop
}
}[/php]

Spark Scala WordCount Code in Editor

You will see lots of error due to missing libraries.

viii. Add Spark Libraries

Configure Spark environment in Eclipse: Right click on project name >> build path >> Configure Build Path

Configure Spark environment in Eclipse

Add the External Jars:

Add the External Jars

ix. Select Spark Jars and insert

You should have spark setup available in developing environment, it will be needed for spark libraries.

Select Spark Jars and insert

Go to “Spark-Home >> jars” and select all the jars:

select all the jars

Import the selected jar:

Import the selected jar

x. Spark Scala Word Count Program

After importing the libraries all the errors will be removed.

Spark WordCount Program in Scala

We have successfully created Spark environment in Eclipse and developed Spark Scala program. Now let’s deploy the Spark job on Linux, before deploying/running the application you must have Spark Installed.
Follow this links to install Apache Spark on single node cluster or on the multi-node cluster.

xi. Create the Spark Scala Program Jar File

Before running created Spark word count application we have to create a jar file. Right click on project >> export

Create the Spark Scala Program Jar File

Select Jar-file Option to Export:

Select Jar-file Option to Export

Create the Jar file:

Create the Jar file

The jar file for the Spark Scala application has been created, now we need to run it.

xii. Go to Spark Home Directory

Login to Linux and open terminal. To run Spark Scala application we will be using Ubuntu Linux. Copy the jar file to Ubuntu and create one text file, which we will use as input for Spark Scala wordcount job.

cd spark home directory

xiii. Submit Spark Application using spark-submit script

To submit the Spark application using below command:

bin/spark-submit --class <Qualified-Class-Name> --master <Master> <Path-Of-Jar-File> <Input-Path> <Output-Path>
bin/spark-submit --class com.dataflair.spark.Wordcount --master local ../sparkJob.jar ../wc-data output

Let’s understand above command:

Submit Spark Application using spark-submit script

Submit Spark Application using spark-submit script

The application has been completed successfully, now browse the result.

xiv. Browse the result

Browse the output directory and open the file with name part-xxxxx which contains the output of the application.

spark wordcount job success

We have successfully created Spark project in Scala and deployed on Ubuntu.
To play with Spark First learn RDD, DataFrame, DataSet in Apache Spark and then refer this Spark shell commands tutorial to practically implements Spark functionalities.
See Also-

Exit mobile version