In MapReduce where sorting is done on mapper node or reducer node

This topic has 1 reply, 1 voice, and was last updated 7 years, 10 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 12:04 pm #4732
  
  DataFlair Team
  Spectator
  
  During the execution of hadoop mapreduce job data is sorted according to the keys. Where sorting of data is done on map or on reduce node ?
- September 20, 2018 at 12:04 pm #4733
  
  DataFlair Team
  Spectator
  
  The role of Mapper is to execute the business logic and produce the Key/Value pairs which are passed to Partitioner. The Partitioner sorts the mapper output Key/Value pairs.
  
  The Partitioner instance runs in the SAME JVM as of Mapper and hence the same node. This implies that that Mapper phase sorting happens on the Mapper node itself.
  
  However, there is an optional sorting that may happen during Reducer phase as well. Sometimes we want to have the sorted output by value than keys. This sorting is achieved by a technique called as Secondary Sorting. The Secondary sorting happens on Reducer node.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.