hadoop mapreduce which sort algorithm is used

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 2:32 pm #5196
  
  DataFlair Team
  Spectator
  
  In hadoop mapreduce job execution which sorting algorithm is used ?
- September 20, 2018 at 2:32 pm #5200
  
  DataFlair Team
  Spectator
  
  In MapReduce job, Mapper generates intermediate key-value pairs which are sorted automatically by the keys. This feature can be applied in a program that required sorting at some stage. Sorting helps the Reducer to distinguish when a new reduce task should start, this saves time for reducer. Reducer starts a new task, when the next key in the sorted input data is different than the previous. Each reduce task takes a key-list(value) pairs.
  
  Sort Algorithm
  
  Mapper– Quick sort is used in map side. Org.apache.hadoop.util.QuickSort class is used for sorting keys. The Quick Sort is implemented by comparing the keys and sorting in ascending order.
  
  Reducer- Merge sort is used in reduce side. Merge sort is the default feature of MapReduce. One cannot change the MapReduce sorting method, the reason is that data comes from the different nodes to a single point, so the best algorithm that can be used here is the merge sort.
  
  You can specify your own comparator class to sort your keys in ascending or descending order.
  Job.setSortComparator(YourClass.class)
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.