Which Sorting algorithm is used in MapReduce

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop Which Sorting algorithm is used in MapReduce

Viewing 1 reply thread
  • Author
    Posts
    • #5150
      DataFlair TeamDataFlair Team
      Spectator

      When a MapReduce job runs, the output of Mapper (i.e. intermediate output) is sorted based on the keys.

      1. Where the Sorting is done on Mapper Node or Reducer Node?

      2. For Sorting which algorithm / technique is used? What are the reasons for selecting the specific sorting algorithm?

    • #5153
      DataFlair TeamDataFlair Team
      Spectator

      1) Sorting is done on the Reducer Node.

      2) Sorting happens at the reducer on the basis of keys.

      • The Mapper class takes the help of WritableComparator class to sort the key-value pairs generated from the Reducer.
      • The WritableComparator class implements Java’s RawComparator Interface.
        WritableComparator class’s compare method is responsible for the sorting of the key value pairs which performs a byte by byte comparision of the keys.

      Technically, the compare method doesn’t sort things but uses comparision to show one key above the other.

      Reducer- Merge sort is used in reduce side. Merge sort is the default feature of MapReduce. One cannot change the MapReduce sorting method, the reason is that data comes from the different nodes to a single point, so the best algorithm that can be used here is the merge sort.

      You can specify your own comparator class to sort your keys in ascending or descending order.
      Job.setSortComparator(YourClass.class)

      Follow the link to learn more about Sorting in Hadoop

Viewing 1 reply thread
  • You must be logged in to reply to this topic.