Apache SPark

Viewing 1 reply thread
  • Author
    Posts
    • #5888
      DataFlair TeamDataFlair Team
      Spectator

      What is the exact differences between reduce and fold operation in spark?

    • #5889
      DataFlair TeamDataFlair Team
      Spectator

      Reduce:
      Reduce methods walk through the elements in a collection,
      applying your function to neighboring elements to yield a new result,
      which is then compared to the next element in the sequence to yield a new result

      def reduce[T]((value1,value1) => res)

      Fold:
      Fold also works similar to Reduce and aggregate over a collection by executing an operation
      but with a specified initial value

      def fold[T](acc:T)((acc,value) => acc)

      Example:

      Finding max in a given RDD

      val employeeData = List(("Ram",1000.0),("Vishnu",2000.0),("Ravi",7000.0))
      val employeeRDD = sc.makeRDD(employeeData)
      
      val dummyEmployee = ("ABC",0.0);
      
      val maxSalaryEmployee = employeeRDD.fold(dummyEmployee)((acc,employee) => {
      if(acc._2 < employee._2) employee else acc})
      println("employee with maximum salary is"+maxSalaryEmployee)

      For more Action on Apache Spark RDD refer RDD Operations

Viewing 1 reply thread
  • You must be logged in to reply to this topic.