Explain reduceByKey() operation

This topic has 2 replies, 1 voice, and was last updated 7 years, 10 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 2:01 pm #5051
  
  DataFlair Team
  Spectator
  
  Explain reduceByKey() operation
- September 20, 2018 at 2:01 pm #5053
  
  DataFlair Team
  Spectator
  
  reduceByKey() is transformation which operate on pairRDD (which contains Key/Value).
  > PairRDD contains tuple, hence we need to pass the function that operator on tuple instead of each element.
  > It merges the values with the same key using associative reduce function.
  > It is wide operation because data shuffles may happen across multiple partitions.
  > It merges data locally before sending data across partitions for optimize data shuffling.
  > It takes function as an input which has two parameter of the same type (values associated with same key) and one element output of the input type(value)
  > We can say that it has three overloaded functions :
  
  reduceBykey(function)
  reduceByKey(function, numberofpartition)
  reduceBykey(partitioner, function)
  
  From :
  http://data-flair.training/blogs/rdd-transformations-actions-apis-apache-spark/#210_ReduceByKey
  It uses associative reduce function, where it merges value of each key. It can be used with Rdd only in key value pair. It’s wide operation which shuffles data from multiple partitions/divisions and creates another RDD. It merges data locally using associative function for optimized data shuffling. Result of the combination (e.g. a sum) is of the same type that the values, and that the operation when combined from different partitions is also the same as the operation when combining values inside a partition.
  
  Example :
  val rdd1 = sc.parallelize(Seq(5,10),(5,15),(4,8),(4,12),(5,20),(10,50))) val rdd2 = rdd1.reduceByKey((x,y)=>x+y) OR rdd2.collect()
  
  Output:
  Array[(Int, Int)] = Array((4,20),(10,50),(5,45))
- September 20, 2018 at 2:01 pm #5054
  
  DataFlair Team
  Spectator
  
  OR part is missing, which is as follows:
  val rdd2 = rdd1.reduceByKey(_+_)
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

Explain reduceByKey() operation

About DataFlair

Trending Courses in Indore

Trending Courses in Bangalore

Trending Courses in Chennai

Trending Courses in Pune

Trending Courses in Hyderabad

Trending Courses in Delhi NCR