How many types of transformation are there in RDD in Apache Spark?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Spark How many types of transformation are there in RDD in Apache Spark?

Viewing 1 reply thread
  • Author
    Posts
    • #5884
      DataFlair TeamDataFlair Team
      Spectator

      what are the types of Apache Spark transformation?
      Explain narrow and wide transformation in Spark.

    • #5885
      DataFlair TeamDataFlair Team
      Spectator

      There are two kinds of transformations:

        <li style=”list-style-type: none”>
      • Narrow transformations
      • Wide transformations

      Narrow transformations:
      Narrow transformations are the result of map, filter and in which data to be transformed
      id from a single partition only, i.e. it is self-sustained.
      An output RDD has partitions with records that originate from a
      single partition in the parent RDD.

      Wide Transformations
      Wide transformations are the result of groupByKey and reduceByKey.
      The data required to compute the records in a single partition may
      reside in many partitions of the parent RDD.

      Wide transformations are also called shuffle transformations as they may or may not depend on a shuffle.
      All of the tuples with the same key must end up in the same partition, processed by the same task.
      To satisfy these operations, Spark must execute RDD shuffle, which transfers data across cluster
      and results in a new stage with a new set of partitions.

      For detailed study on RDD operations read RDD Operation

Viewing 1 reply thread
  • You must be logged in to reply to this topic.