Apache Spark Map vs FlatMap Operation
1. Objective
In this Apache Spark tutorial, we will discuss the comparison between Spark Map vs FlatMap Operation. Map and FlatMap are the transformation operations in Spark. Map() operation applies to each element of RDD and it returns the result as new RDD. In the Map, operation developer can define his own custom business logic. While FlatMap() is similar to Map, but FlatMap allows returning 0, 1 or more elements from map function.
In this blog, we will discuss how to perform map operation on RDD and how to process data using FlatMap operation. This tutorial also covers what is map operation, what is a flatMap operation, the difference between map() and flatMap() transformation in Apache Spark with examples. We will also see Spark map and flatMap example in Scala and Java in this Spark tutorial.
So, let’s start Spark Map vs FlatMap function.
 Do you know How to install and configure Apache Spark?
2. Difference between Spark Map vs FlatMap Operation
This section of the Spark tutorial provides the details of Map vs FlatMap operation in Apache Spark with examples in Scala and Java programming languages.
i. Spark Map Transformation
A map is a transformation operation in Apache Spark. It applies to each element of RDD and it returns the result as new RDD. In the Map, operation developer can define his own custom business logic. The same logic will be applied to all the elements of RDD.
Spark Map function takes one element as input process it according to custom code (specified by the developer) and returns one element at a time. Map transforms an RDD of length N into another RDD of length N. The input and output RDDs will typically have the same number of records.
a. Map Transformation Scala Example
Create RDD
val data = spark.read.textFile("INPUT-PATH").rdd
Above statement will create an RDD with name data. Follow this guide to learn more ways to create RDDs in Apache Spark.
Map Transformation-1
val newData = data.map (line => line.toUpperCase() )
Above the map, a transformation will convert each and every record of RDD to upper case.
Map Transformation-2
val tag = data.map {line => { val xml = XML.loadString(line) xml.attribute("Tags").get.toString() } }
Above the map, a transformation will parse XML and collect Tag attribute from the XML data. Overall the map operation is converting XML into a structured format.
Follow this link to know about Java Programming Language
b. Map Transformation Java Example
Create RDD
JavaRDD<String> linesRDD = spark.read().textFile("INPUT-PATH").javaRDD();
Above statement will create an RDD with name lines RDD.
Map Transformation
JavaRDD<String> newData = linesRDD.map(new Function<String, String>() { public String call(String s) { String result = s.trim().toUpperCase(); return result; } });
We recommend you to read –Â Spark Shell Commands to Interact with Spark-Scala
ii. Spark FlatMap Transformation Operation
Let’s now discuss flatMap() operation in Apache Spark-
A flatMap is a transformation operation. It applies to each element of RDD and it returns the result as new RDD. It is similar to Map, but FlatMap allows returning 0, 1 or more elements from map function. In the FlatMap operation, a developer can define his own custom business logic. The same logic will be applied to all the elements of the RDD.
A FlatMap function takes one element as input process it according to custom code (specified by the developer) and returns 0Â or more element at a time. flatMap() transforms an RDD of length N into another RDD of length M.
a. FlatMap Transformation Scala Example
val result = data.flatMap (line => line.split(" ") )
Above flatMap transformation will convert a line into words. One word will be an individual element of the newly created RDD.
Learn to Create Spark project in Scala with Eclipse
b. FlatMap Transformation Java Example
JavaRDD<String> result = data.flatMap(new FlatMapFunction<String, String>() { public Iterator<String> call(String s) { return Arrays.asList(s.split(" ")).iterator(); } });
Above flatMap transformation will convert a line into words. One word will be an individual element of the newly created RDD.
3. Conclusion
Hence, from the comparison between Spark map() vs flatMap(), it is clear that Spark map function expresses a one-to-one transformation. It transforms each element of a collection into one element of the resulting collection. While Spark flatMap function expresses a one-to-many transformation. It transforms each element to 0 or more elements.
Please leave a comment if you like this post or have any query about Apache Spark map vs flatMap function.
See Also-
You give me 15 seconds I promise you best tutorials
Please share your happy experience on Google
makes lot of sense, gifs are too nice
Glad Rohan, you not only liked our tutorial but also noticed the GIF work on Spark Map vs FlatMap Operation. These images, examples, and GIFs are specially added to help you. So that you can understand all theory knowledge with a practical touch. We are continuously working for you. Keep connected with us for reading more interesting articles on Spark Technology.
Very nicely explained!!!
Really good animation for explanation.
Hellow Chandresh
Glad to know that all our readers are liking the animation part of Spark map vs Flatmap operation. Your this reply really appreciate us to do more such activity for the better explanation. Keep reading and keep enjoying.
A great description clearly describe about the difference and the mechanism with examples.
Thank you!
Chameera, you are amazing.
Very happy to read reviews of our loyal readers.
Hope along with difference you clearly understood the Spark Map and Flatmap Transformation. We have explained the practical examples along with GIF so that no one gets difficulty in learning new Spark Concepts.
Keep reading and enjoy your learning with Data Flair
Awesome explanation. Keep it up.
Thank you.
Hii Raunak
Thank you, for giving such a positive and motivated review on Spark Map vs Flatmap. Your thoughts really appreciate us to publish more blogs which can help you.\
Best wishes to you.
Awesome explanation.
Glad Suresh, you liked our explanation on Spark Map vs FlatMap. For in-depth Spark learning, you can check more Spark article, which will help you make a bright career in Spark.
Here is the link for you –
https://data-flair.training/blogs/spark-rdd-operations-transformations-actions/
Thank you . good comparison
Hellow Mallikarjun,
Thanks a lot for taking time and leaving us a review. We are glad that you like our comparison of Spark Map vs Flatmap.
Keep connected with us, we have more such blogs that will help you to grab more knowledge on Spark Technology.
Great explanation, especially by animation. Thanks
We try our best to embed GIFs into the text so we can effectively drive the point home. You can also try our latest blogs on Spark Technology. We are very much sure that you will have a good experience in learning Spark Technology with us. Take a step forward and begin your Spark tour with us –
https://data-flair.training/blogs/apache-spark-in-memory-computing/
Nice blog. Thanks for this.
Hellow Akash,
Glad our comparison of Spark Map vs Flatmap proves useful to you. If you want to explore more in Spark Technology, then you must visit the link given below. Surely, you will feel good and much more experienced after learning more about Spark with us. Have a look –
https://data-flair.training/blogs/apache-spark-rdd-vs-dataframe-vs-dataset/
You can also visit our Spark interview and quiz part for grabbing more knowledge
Nice explanation with GIF image. Easy to understand the concepts.
Thank You!
Nice Blog, Specially GIFS (Y)
Thanks, Narendra Thank you so much for taking the time to write this excellent review. We always try to make the best user experience for learning.
Keep Visiting and Keep Learning
Regards,
DataFlair
Thanks a million for detailed explanation. Its so easy to understand. Nice work.
We are glad when we see such kind of appreciation from our loyal readers. Thanks, Chiranjeevi from writing us on Apache Spark map vs flat map.
We would like to suggest you refer our Spark Interview Questions and Quiz. It will surely help you to brush up your skills,
Hope, it will help you!
DataFlair
Sir Anish, you are always awsome. Very nice explanation. Using your concepts in industry. For every doubt i visit dataflair. Stay blessed Sir.
Hello Aamer,
So glad to see you gain an edge in the industry with whatever you learned during your training course with us! You make us proud. Keep sharing your experience with peer groups.
Regards,
DataFlair
Suggested this training site to all my friends… Your explanations are really good.
Thanks for referring DataFlair to your friends. Our aim is to provide informative knowledge for the people like you. Keep learning and keep sharing
good explanation with process flow…excelent!
GIF concept is awesome. It would be nice if you put gif for almost all BigData concepts :>
Its really nice reading all your training.. keep up the good work… Happy teaching
it’s very useful, thank u sir
Thanks for the feedback. You can also explore other concepts of Spark programming from the sidebar.
Excellent explanation. Thank you !!
Really liked the explanation. Best on the internet.
Thank you so much! Now the difference is completely clear
Thanks for the feedback. You can also explore other concepts of Spark programming from the sidebar.
Graphical representation of map and flatmap is pluspoint , after watching that concept registered in mind without any confusion.
Thank you very much, the best explanation found in google 🙂
Animation makes totally sense!
We are glad that our readers are liking our efforts. Do share your rating and feedback at Google.
So nice article , thank you so much