In MapReduce where sorting is done on mapper node or reducer node

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop In MapReduce where sorting is done on mapper node or reducer node

Viewing 1 reply thread
  • Author
    Posts
    • #4732
      DataFlair TeamDataFlair Team
      Spectator

      During the execution of hadoop mapreduce job data is sorted according to the keys. Where sorting of data is done on map or on reduce node ?

    • #4733
      DataFlair TeamDataFlair Team
      Spectator

      The role of Mapper is to execute the business logic and produce the Key/Value pairs which are passed to Partitioner. The Partitioner sorts the mapper output Key/Value pairs.

      The Partitioner instance runs in the SAME JVM as of Mapper and hence the same node. This implies that that Mapper phase sorting happens on the Mapper node itself.

      However, there is an optional sorting that may happen during Reducer phase as well. Sometimes we want to have the sorted output by value than keys. This sorting is achieved by a technique called as Secondary Sorting. The Secondary sorting happens on Reducer node.

Viewing 1 reply thread
  • You must be logged in to reply to this topic.