hadoop mapreduce which sort algorithm is used

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop hadoop mapreduce which sort algorithm is used

Viewing 1 reply thread
  • Author
    Posts
    • #5196
      DataFlair TeamDataFlair Team
      Spectator

      In hadoop mapreduce job execution which sorting algorithm is used ?

    • #5200
      DataFlair TeamDataFlair Team
      Spectator

      In MapReduce job, Mapper generates intermediate key-value pairs which are sorted automatically by the keys. This feature can be applied in a program that required sorting at some stage. Sorting helps the Reducer to distinguish when a new reduce task should start, this saves time for reducer. Reducer starts a new task, when the next key in the sorted input data is different than the previous. Each reduce task takes a key-list(value) pairs.

      Sort Algorithm

      Mapper– Quick sort is used in map side. Org.apache.hadoop.util.QuickSort class is used for sorting keys. The Quick Sort is implemented by comparing the keys and sorting in ascending order.

      Reducer- Merge sort is used in reduce side. Merge sort is the default feature of MapReduce. One cannot change the MapReduce sorting method, the reason is that data comes from the different nodes to a single point, so the best algorithm that can be used here is the merge sort.

      You can specify your own comparator class to sort your keys in ascending or descending order.
      Job.setSortComparator(YourClass.class)

Viewing 1 reply thread
  • You must be logged in to reply to this topic.