Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Spark › Map vs FlatMap in Apache Spark?
- This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 9:51 pm #6416DataFlair TeamSpectator
What is the difference between FlatMap and Map Transformation operation in Spark / Scala explain with example?
Where we should use Map and FlatMap operation? -
September 20, 2018 at 9:51 pm #6417DataFlair TeamSpectator
1. Spark Map Transformation
A map is a transformation operation in Apache Spark. It applies to each element of RDD and it returns the result as new RDD. In the Map, operation developer can define his own custom business logic. The same logic will be applied to all the elements of RDD.
Spark Map function takes one element as input process it according to custom code (specified by the developer) and returns one element at a time. Map transforms an RDD of length N into another RDD of length N. The input and output RDDs will typically have the same number of records.# Map Transformation Scala Example
a. Create RDD
val data = spark.read.textFile(“INPUT-PATH”).rdd
Above statement will create an RDD with name data. Follow this guide to learn more ways to create RDDs in Apache Spark.
b. Map Transformation-1
val newData = data.map (line => line.toUpperCase() )
Above the map, a transformation will convert each and every record of RDD to upper case.
c. Map Transformation-2
val tag = data.map {line => {
val xml = XML.loadString(line)
xml.attribute(“Tags”).get.toString()
}
}Above the map, a transformation will parse XML and collect Tag attribute from the XML data. Overall the map operation is converting XML into a structured format.
# Map Transformation Java Example
a. Create RDD
JavaRDD<String> linesRDD = spark.read().textFile(“INPUT-PATH”).javaRDD();
Above statement will create an RDD with name linesRDD.
b. Map Transformation
JavaRDD<String> newData = linesRDD.map(new Function<String, String>() {
public String call(String s) {
String result = s.trim().toUpperCase();
return result;
}
});2. Spark FlatMap Transformation Operation
Let’s now discuss flatMap() operation in Apache Spark-
A flatMap is a transformation operation. It applies to each element of RDD and it returns the result as new RDD. It is similar to Map, but FlatMap allows returning 0, 1 or more elements from map function. In the FlatMap operation, a developer can define his own custom business logic. The same logic will be applied to all the elements of the RDD.
A FlatMap function takes one element as input process it according to custom code (specified by the developer) and returns 0 or more element at a time. flatMap() transforms an RDD of length N into another RDD of length M.
# FlatMap Transformation Scala Exampleval result = data.flatMap (line => line.split(” “) )
Above flatMap transformation will convert a line into words. One word will be an individual element of the newly created RDD.
# FlatMap Transformation Java Example
JavaRDD<String> result = data.flatMap(new FlatMapFunction<String, String>() {
public Iterator<String> call(String s) {
return Arrays.asList(s.split(” “)).iterator();
} });Above flatMap transformation will convert a line into words. One word will be an individual element of the newly created RDD.
There are many more Transformation Operations, to learn all follow link: Spark RDD Operations-Transformation & Action with Example
-
-
AuthorPosts
- You must be logged in to reply to this topic.