In MapReduce where sorting is done on mapper node or reducer node
-
-
During the execution of hadoop mapreduce job data is sorted according to the keys. Where sorting of data is done on map or on reduce node ?
-
The role of Mapper is to execute the business logic and produce the Key/Value pairs which are passed to Partitioner. The Partitioner sorts the mapper output Key/Value pairs.
The Partitioner instance runs in the SAME JVM as of Mapper and hence the same node. This implies that that Mapper phase sorting happens on the Mapper node itself.
However, there is an optional sorting that may happen during Reducer phase as well. Sometimes we want to have the sorted output by value than keys. This sorting is achieved by a technique called as Secondary Sorting. The Secondary sorting happens on Reducer node.
- You must be logged in to reply to this topic.