Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Spark › Explain transformation and action in Spark
- This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 9:41 pm #6395DataFlair TeamSpectator
Define transformation and Action in Apache Spark RDD.
-
September 20, 2018 at 9:41 pm #6396DataFlair TeamSpectator
Before we start with Spark RDD Operations, let us deep dive into RDD in Spark.
Apache Spark RDD supports two types of Operations-
Transformations
ActionsNow let’s discuss them in detail:-
Transformation: These type of operations transform the data to a new data using some logic may be mapping, filtering, grouping, reducing.
ex: rdd2=rdd1.groupByKey()
Here data of rdd1 is in key value pair and the operation specified here groups the data by the key value and creates a new rdd of name rdd2.Action: These are the operation that evaluates the overall application, get the results that is the main objective of the application. Like getting the count, storing some filtered or mapped data, printing on console
ex: rdd2.saveAsText(“file_path”)Above operation saves the rdd2 to the path specified.
Transformation operations are lazy in nature and Actions are the trigger. If we load the data and apply some kind of filtering, mapping, grouping, spark just make an entry for each transformation in DAG(Directed Acyclic Graph), ie a flow of data. It doesn’t perform any operation until and unless an action operation is not applied to data. This behaviour of spark is known as lazy evaluation. Lazy evaluation increases the overall cluster performance by optimizing resource utilization.
For detailed information of Spark RDD Operations with examples, follow the link: Spark RDD Operations-Transformation & Action with Example
-
-
AuthorPosts
- You must be logged in to reply to this topic.