What are the types of Apache Spark transformation?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Spark What are the types of Apache Spark transformation?

Viewing 1 reply thread
  • Author
    Posts
    • #6420
      DataFlair TeamDataFlair Team
      Spectator

      How many types of transformation are there in Spark?

    • #6421
      DataFlair TeamDataFlair Team
      Spectator

      To understand the types of Transformation better, Let’s begin with the brief introduction of Transformation inApache Spark.

      Transformation in Spark
      Spark Transformation is a function that produces new RDDfrom the existing RDDs. It takes RDD as input and produces one or more RDD as output. Each time it creates new RDD when we apply any transformation. As RDDs are immutable in nature, so input RDDs, cannot be changed.
      An RDD lineage, built by Applying transformation built with the entire parent RDDs of the final RDD(s). In other words, it is also known as RDD operator graph or RDD dependency graph. It is a logical execution plan i.e., it is Directed Acyclic Graph (DAG) of the entire parent RDDs of RDD.

      Transformations are lazy in nature i.e., they get execute when we call an action. They are not executed immediately. Two most basic type of transformations is a map(), filter().

      Resultant RDD is always dissimilar from its parent RDD. It can be smaller (e.g. filter, count, distinct, sample), bigger (e.g. flatMap(), union(), Cartesian()) or the same size (e.g. map).

      Now, let’s focus on the question, there are fundamentally two types of transformations:

      1. Narrow transformation –
      While talking about Narrow transformation, all the elements which are required to compute the records in single partition reside in the single partition of parent RDD. To calculate the result, a limited subset of partition is used. This Transformation are the result of map(), filter().

      2. Wide Transformations – 
      Wide transformation means all the elements that are required to compute the records in the single partition may live in many partitions of parent RDD. Partitions may reside in many different partitions of parent RDD. This Transformation is a result of groupbyKey() and reducebyKey().

      For more detailed insights of Transformations in Spark. Refer link: Spark RDD Operations-Transformation & Action with Example

Viewing 1 reply thread
  • You must be logged in to reply to this topic.