Explain fold() operation in Spark.

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 2:59 pm #5372
  
  DataFlair Team
  Spectator
  
  Explain fold() operation in Spark.
- September 20, 2018 at 2:59 pm #5374
  DataFlair Team
  Spectator
  - fold() is an action. It is wide operation (i.e. shuffle data across multiple partitions and output a single value)
  - It takes function as an input which has two parameters of the same type and outputs a single value of the input type.
  - It is similar to reduce but has one more argument ‘ZERO VALUE’ (say initial value) which will be used in the initial call on each partition.
  def fold(zeroValue: T)(op: (T, T) ⇒ T): T
  
  Aggregate the elements of each partition, and then the results for all the partitions, using a given associative function and a neutral “zero value”. The function op(t1, t2) is allowed to modify t1 and return it as its result value to avoid object allocation; however, it should not modify t2.
  
  This behaves somewhat differently from fold operations implemented for non-distributed collections in functional languages like Scala. This fold operation may be applied to partitions individually, and then fold those results into the final result, rather than apply the fold to each element sequentially in some defined ordering. For functions that are not commutative, the result may differ from that of a fold applied to a non-distributed collection.
  
  zeroValue: The initial value for the accumulated result of each partition for the op operator, and also the initial value for the combine results from different partitions for the op operator – this will typically be the neutral element (e.g. Nil for list concatenation or 0 for summation)
  Op: an operator used to both accumulate results within a partition and combine results from different partitions
  
  Example :
```
val rdd1 = sc.parallelize(List(1,2,3,4,5),3)
rdd1.fold(5)(_+_)
```
  Output :
  Int = 35
```
val rdd1 = sc.parallelize(List(1,2,3,4,5))
rdd1.fold(5)(_+_)
```
  Output :
  Int = 25
```
val rdd1 = sc.parallelize(List(1,2,3,4,5),3)
rdd1.fold(3)(_+_)
```
  Int = 27
  
  For more operations on Apache Spark RDD refer Transformation and Action in Spark
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

Explain fold() operation in Spark.

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses