Live instructor-led & Self-paced Online Certification Training Courses (Big Data, Hadoop, Spark) Forums Apache Spark What is aggregate() function. How it can be used with spark RDD?

This topic contains 1 reply, has 1 voice, and was last updated by  dfbdteam5 8 months, 1 week ago.

Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
    Posts
  • #6500

    dfbdteam5
    Moderator

    Anyone can please let you explain aggregate() action by example.

    #6501

    dfbdteam5
    Moderator

    aggregate() function

    In order to get data type different from the input type, this function gives us the flexibility. Basically, to get the final result, the aggregate() takes two functions. Form first one, we combine the element from our RDD along with the accumulator, and from the second one, it combines the accumulator. Thus, we supply the initial zero value of the type which we want to return, in aggregate.

    For Example:

    scala> val inputrdd = sc.parallelize(
    List(
    (“maths”, 21),
    (“english”, 22),
    (“science”, 31)
    ),
    3
    )
    inputrdd: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[14] at parallelize at :21

    scala> inputrdd.partitions.size
    res18: Int = 3

    scala> val result = inputrdd.aggregate(3) (
    /*
    * This is a seqOp for merging T into a U
    * ie (String, Int) in into Int
    * (we take (String, Int) in ‘value’ & return Int)
    * Arguments :
    * acc : Reprsents the accumulated result
    * value : Represents the element in ‘inputrdd’
    * In our case this of type (String, Int)
    * Return value
    * We are returning an Int
    */
    (acc, value) => (acc + value._2),

    /*
    * This is a combOp for mergining two U’s
    * (ie 2 Int)
    */
    (acc1, acc2) => (acc1 + acc2)
    )
    result: Int = 86

    The result is calculated as follows,

    Partition 1 : Sum(all Elements) + 3 (Zero value)
    Partition 2 : Sum(all Elements) + 3 (Zero value)
    Partition 3 : Sum(all Elements) + 3 (Zero value)
    Result = Partition1 + Partition2 + Partition3 + 3(Zero value)

    So we get 21 + 22 + 31 + (4 * 3) = 86.

    Learn more Spark RDD Operations-Transformation & Action with Example, follow the link: Spark RDD Operations-Transformation & Action with Example

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.