What are the types of Apache Spark transformation?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 9:54 pm #6420
  
  DataFlair Team
  Spectator
  
  How many types of transformation are there in Spark?
- September 20, 2018 at 9:54 pm #6421
  
  DataFlair Team
  Spectator
  
  To understand the types of Transformation better, Let’s begin with the brief introduction of Transformation inApache Spark.
  
  Transformation in Spark
  Spark Transformation is a function that produces new RDDfrom the existing RDDs. It takes RDD as input and produces one or more RDD as output. Each time it creates new RDD when we apply any transformation. As RDDs are immutable in nature, so input RDDs, cannot be changed.
  An RDD lineage, built by Applying transformation built with the entire parent RDDs of the final RDD(s). In other words, it is also known as RDD operator graph or RDD dependency graph. It is a logical execution plan i.e., it is Directed Acyclic Graph (DAG) of the entire parent RDDs of RDD.
  
  Transformations are lazy in nature i.e., they get execute when we call an action. They are not executed immediately. Two most basic type of transformations is a map(), filter().
  
  Resultant RDD is always dissimilar from its parent RDD. It can be smaller (e.g. filter, count, distinct, sample), bigger (e.g. flatMap(), union(), Cartesian()) or the same size (e.g. map).
  
  Now, let’s focus on the question, there are fundamentally two types of transformations:
  
  1. Narrow transformation –
  While talking about Narrow transformation, all the elements which are required to compute the records in single partition reside in the single partition of parent RDD. To calculate the result, a limited subset of partition is used. This Transformation are the result of map(), filter().
  
  2. Wide Transformations –
  Wide transformation means all the elements that are required to compute the records in the single partition may live in many partitions of parent RDD. Partitions may reside in many different partitions of parent RDD. This Transformation is a result of groupbyKey() and reducebyKey().
  
  For more detailed insights of Transformations in Spark. Refer link: Spark RDD Operations-Transformation & Action with Example
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

What are the types of Apache Spark transformation?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses