Apache SPark

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 4:32 pm #5888
  
  DataFlair Team
  Spectator
  
  What is the exact differences between reduce and fold operation in spark?
- September 20, 2018 at 4:32 pm #5889
  DataFlair Team
  Spectator
  Reduce:
  Reduce methods walk through the elements in a collection,
  applying your function to neighboring elements to yield a new result,
  which is then compared to the next element in the sequence to yield a new result
  
  def reduce[T]((value1,value1) => res)
  
  Fold:
  Fold also works similar to Reduce and aggregate over a collection by executing an operation
  but with a specified initial value
  
  def fold[T](acc:T)((acc,value) => acc)
  
  Example:
  
  Finding max in a given RDD
```
val employeeData = List(("Ram",1000.0),("Vishnu",2000.0),("Ravi",7000.0))
val employeeRDD = sc.makeRDD(employeeData)

val dummyEmployee = ("ABC",0.0);

val maxSalaryEmployee = employeeRDD.fold(dummyEmployee)((acc,employee) => {
if(acc._2 < employee._2) employee else acc})
println("employee with maximum salary is"+maxSalaryEmployee)
```
  For more Action on Apache Spark RDD refer RDD Operations
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.