Explain the operation reduce()

This topic has 2 replies, 1 voice, and was last updated 7 years, 10 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 1:59 pm #5040
  
  DataFlair Team
  Spectator
  
  Explain the operation reduce()
- September 20, 2018 at 2:00 pm #5042
  
  DataFlair Team
  Spectator
  
  > reduce() is an action. It is wide operation (i.e. shuffle data across multiple partitions and output a single value)
  > It takes function as an input which has two parameter of the same type and output a single value of the input type.
  > i.e. combine the elements of RDD together.
  
  Example 1 :
  val rdd1 = sc.parallelize(1 to 100)
  val rdd2 = rdd1.reduce((x,y) => x+y)
  
  OR
  
  val rdd2 = rdd1.reduce(_ + _)
  
  Output :
  rdd2: Int = 5050
  
  Example 2:
  val rdd1 = sc.parallelize(1 to 5)
  val rdd2 = rdd1.reduce(_*_)
  
  Output :
  rdd2: Int = 120
- September 20, 2018 at 2:00 pm #5045
  
  DataFlair Team
  Spectator
  
  From :
  http://data-flair.training/blogs/rdd-transformations-actions-apis-apache-spark/#33_Reduce
  
  It takes function with two arguments an accumulator and a value which should be commutative and Associative in mathematical nature. It reduces a list of element s into one as a result. This function produces same result when continuously applied on same set of RDD data with multiple partitions irrespective of elements order. It is wide operation.
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.