Which Sorting algorithm is used in MapReduce

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 2:24 pm #5150
  
  DataFlair Team
  Spectator
  
  When a MapReduce job runs, the output of Mapper (i.e. intermediate output) is sorted based on the keys.
  
  1. Where the Sorting is done on Mapper Node or Reducer Node?
  
  2. For Sorting which algorithm / technique is used? What are the reasons for selecting the specific sorting algorithm?
- September 20, 2018 at 2:24 pm #5153
  DataFlair Team
  Spectator
  1) Sorting is done on the Reducer Node.
  
  2) Sorting happens at the reducer on the basis of keys.
  - The Mapper class takes the help of WritableComparator class to sort the key-value pairs generated from the Reducer.
  - The WritableComparator class implements Java’s RawComparator Interface.
    WritableComparator class’s compare method is responsible for the sorting of the key value pairs which performs a byte by byte comparision of the keys.
  Technically, the compare method doesn’t sort things but uses comparision to show one key above the other.
  
  Reducer- Merge sort is used in reduce side. Merge sort is the default feature of MapReduce. One cannot change the MapReduce sorting method, the reason is that data comes from the different nodes to a single point, so the best algorithm that can be used here is the merge sort.
  
  You can specify your own comparator class to sort your keys in ascending or descending order.
  Job.setSortComparator(YourClass.class)
  
  Follow the link to learn more about Sorting in Hadoop
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

Which Sorting algorithm is used in MapReduce

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses