Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › hadoop mapreduce which sort algorithm is used
- This topic has 1 reply, 1 voice, and was last updated 5 years, 6 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 2:32 pm #5196DataFlair TeamSpectator
In hadoop mapreduce job execution which sorting algorithm is used ?
-
September 20, 2018 at 2:32 pm #5200DataFlair TeamSpectator
In MapReduce job, Mapper generates intermediate key-value pairs which are sorted automatically by the keys. This feature can be applied in a program that required sorting at some stage. Sorting helps the Reducer to distinguish when a new reduce task should start, this saves time for reducer. Reducer starts a new task, when the next key in the sorted input data is different than the previous. Each reduce task takes a key-list(value) pairs.
Sort Algorithm
Mapper– Quick sort is used in map side. Org.apache.hadoop.util.QuickSort class is used for sorting keys. The Quick Sort is implemented by comparing the keys and sorting in ascending order.
Reducer- Merge sort is used in reduce side. Merge sort is the default feature of MapReduce. One cannot change the MapReduce sorting method, the reason is that data comes from the different nodes to a single point, so the best algorithm that can be used here is the merge sort.
You can specify your own comparator class to sort your keys in ascending or descending order.
Job.setSortComparator(YourClass.class)
-
-
AuthorPosts
- You must be logged in to reply to this topic.