Explain transformation and action in Spark

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Spark Explain transformation and action in Spark

Viewing 1 reply thread
  • Author
    Posts
    • #6395
      DataFlair TeamDataFlair Team
      Spectator

      Define transformation and Action in Apache Spark RDD.

    • #6396
      DataFlair TeamDataFlair Team
      Spectator

      Before we start with Spark RDD Operations, let us deep dive into RDD in Spark.

      Apache Spark RDD supports two types of Operations-

      Transformations
      Actions

      Now let’s discuss them in detail:-

      Transformation: These type of operations transform the data to a new data using some logic may be mapping, filtering, grouping, reducing.
      ex: rdd2=rdd1.groupByKey()
      Here data of rdd1 is in key value pair and the operation specified here groups the data by the key value and creates a new rdd of name rdd2.

      Action: These are the operation that evaluates the overall application, get the results that is the main objective of the application. Like getting the count, storing some filtered or mapped data, printing on console
      ex: rdd2.saveAsText(“file_path”)

      Above operation saves the rdd2 to the path specified.

      Transformation operations are lazy in nature and Actions are the trigger. If we load the data and apply some kind of filtering, mapping, grouping, spark just make an entry for each transformation in DAG(Directed Acyclic Graph), ie a flow of data. It doesn’t perform any operation until and unless an action operation is not applied to data. This behaviour of spark is known as lazy evaluation. Lazy evaluation increases the overall cluster performance by optimizing resource utilization.

      For detailed information of Spark RDD Operations with examples, follow the link: Spark RDD Operations-Transformation & Action with Example

Viewing 1 reply thread
  • You must be logged in to reply to this topic.