Explain the operation reduce()

Viewing 2 reply threads
  • Author
    Posts
    • #5040
      DataFlair TeamDataFlair Team
      Spectator

      Explain the operation reduce()

    • #5042
      DataFlair TeamDataFlair Team
      Spectator

      > reduce() is an action. It is wide operation (i.e. shuffle data across multiple partitions and output a single value)
      > It takes function as an input which has two parameter of the same type and output a single value of the input type.
      > i.e. combine the elements of RDD together.

      Example 1 :
      val rdd1 = sc.parallelize(1 to 100)
      val rdd2 = rdd1.reduce((x,y) => x+y)

      OR

      val rdd2 = rdd1.reduce(_ + _)


      Output :
      rdd2: Int = 5050

      Example 2:
      val rdd1 = sc.parallelize(1 to 5)
      val rdd2 = rdd1.reduce(_*_)


      Output :
      rdd2: Int = 120

    • #5045
      DataFlair TeamDataFlair Team
      Spectator

      From :
      http://data-flair.training/blogs/rdd-transformations-actions-apis-apache-spark/#33_Reduce

      It takes function with two arguments an accumulator and a value which should be commutative and Associative in mathematical nature. It reduces a list of element s into one as a result. This function produces same result when continuously applied on same set of RDD data with multiple partitions irrespective of elements order. It is wide operation.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.