This topic contains 1 reply, has 1 voice, and was last updated by  dfbdteam3 1 year, 6 months ago.

Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
    Posts
  • #5150

    dfbdteam3
    Moderator

    When a MapReduce job runs, the output of Mapper (i.e. intermediate output) is sorted based on the keys.

    1. Where the Sorting is done on Mapper Node or Reducer Node?

    2. For Sorting which algorithm / technique is used? What are the reasons for selecting the specific sorting algorithm?

    #5153

    dfbdteam3
    Moderator

    1) Sorting is done on the Reducer Node.

    2) Sorting happens at the reducer on the basis of keys.

    • The Mapper class takes the help of WritableComparator class to sort the key-value pairs generated from the Reducer.
    • The WritableComparator class implements Java’s RawComparator Interface.
      WritableComparator class’s compare method is responsible for the sorting of the key value pairs which performs a byte by byte comparision of the keys.

    Technically, the compare method doesn’t sort things but uses comparision to show one key above the other.

    Reducer- Merge sort is used in reduce side. Merge sort is the default feature of MapReduce. One cannot change the MapReduce sorting method, the reason is that data comes from the different nodes to a single point, so the best algorithm that can be used here is the merge sort.

    You can specify your own comparator class to sort your keys in ascending or descending order.
    Job.setSortComparator(YourClass.class)

    Follow the link to learn more about Sorting in Hadoop

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.