**aggregate() function**

In order to get data type different from the input type, this function gives us the flexibility. Basically, to get the final result, the aggregate() takes two functions. Form first one, we combine the element from our RDD along with the accumulator, and from the second one, it combines the accumulator. Thus, we supply the initial zero value of the type which we want to return, in aggregate.

For Example:

scala> val inputrdd = sc.parallelize(

List(

(“maths”, 21),

(“english”, 22),

(“science”, 31)

),

3

)

inputrdd: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[14] at parallelize at :21

scala> inputrdd.partitions.size

res18: Int = 3

scala> val result = inputrdd.aggregate(3) (

/*

* This is a seqOp for merging T into a U

* ie (String, Int) in into Int

* (we take (String, Int) in ‘value’ & return Int)

* Arguments :

* acc : Reprsents the accumulated result

* value : Represents the element in ‘inputrdd’

* In our case this of type (String, Int)

* Return value

* We are returning an Int

*/

(acc, value) => (acc + value._2),

/*

* This is a combOp for mergining two U’s

* (ie 2 Int)

*/

(acc1, acc2) => (acc1 + acc2)

)

result: Int = 86

The result is calculated as follows,

Partition 1 : Sum(all Elements) + 3 (Zero value)

Partition 2 : Sum(all Elements) + 3 (Zero value)

Partition 3 : Sum(all Elements) + 3 (Zero value)

Result = Partition1 + Partition2 + Partition3 + 3(Zero value)

So we get 21 + 22 + 31 + (4 * 3) = 86.

Learn more Spark RDD Operations-Transformation & Action with Example, follow the link: Spark RDD Operations-Transformation & Action with Example