What is aggregate() function. How it can be used with spark RDD?

This topic has 1 reply, 1 voice, and was last updated 7 years, 10 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 11:30 pm #6500
  
  DataFlair Team
  Spectator
  
  Anyone can please let you explain aggregate() action by example.
- September 20, 2018 at 11:30 pm #6501
  
  DataFlair Team
  Spectator
  
  aggregate() function
  
  In order to get data type different from the input type, this function gives us the flexibility. Basically, to get the final result, the aggregate() takes two functions. Form first one, we combine the element from our RDD along with the accumulator, and from the second one, it combines the accumulator. Thus, we supply the initial zero value of the type which we want to return, in aggregate.
  
  For Example:
  
  scala> val inputrdd = sc.parallelize(
  List(
  (“maths”, 21),
  (“english”, 22),
  (“science”, 31)
  ),
  3
  )
  inputrdd: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[14] at parallelize at :21
  
  scala> inputrdd.partitions.size
  res18: Int = 3
  
  scala> val result = inputrdd.aggregate(3) (
  /*
  * This is a seqOp for merging T into a U
  * ie (String, Int) in into Int
  * (we take (String, Int) in ‘value’ & return Int)
  * Arguments :
  * acc : Reprsents the accumulated result
  * value : Represents the element in ‘inputrdd’
  * In our case this of type (String, Int)
  * Return value
  * We are returning an Int
  */
  (acc, value) => (acc + value._2),
  
  /*
  * This is a combOp for mergining two U’s
  * (ie 2 Int)
  */
  (acc1, acc2) => (acc1 + acc2)
  )
  result: Int = 86
  
  The result is calculated as follows,
  
  Partition 1 : Sum(all Elements) + 3 (Zero value)
  Partition 2 : Sum(all Elements) + 3 (Zero value)
  Partition 3 : Sum(all Elements) + 3 (Zero value)
  Result = Partition1 + Partition2 + Partition3 + 3(Zero value)
  
  So we get 21 + 22 + 31 + (4 * 3) = 86.
  
  Learn more Spark RDD Operations-Transformation & Action with Example, follow the link: Spark RDD Operations-Transformation & Action with Example
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

What is aggregate() function. How it can be used with spark RDD?

About DataFlair

Trending Courses in Indore

Trending Courses in Bangalore

Trending Courses in Chennai

Trending Courses in Pune

Trending Courses in Hyderabad

Trending Courses in Delhi NCR