What is the need of sorting in MapReduce

Viewing 3 reply threads
  • Author
    Posts
    • #5603
      DataFlair TeamDataFlair Team
      Spectator

      In MapReduce intermediate output is sorted by keys. What is the need of sorting in MapReduce flow ?

    • #5605
      DataFlair TeamDataFlair Team
      Spectator

      Sort phase in MapReduce covers merging and sorting of map outputs. Map generates intermediate key-value pairs.All these intermediate key-value pairs are sorted by key. Each reduce task takes list of key-value pairs as input,but Sorting at mapper saves time for the reducer helping it to easily distinguish when a new reduce task should start.It simple starts a new reduce task,when next key in sorted input data is different than previous. If we want the sorted values then we can use the secondary sorting technique at the reducer.

      Follow the link to learn more about Sorting in Hadoop

    • #5607
      DataFlair TeamDataFlair Team
      Spectator

      Sorting in Hadoop helps reducer to easily distinguish when a new reduce task should start, thus saves time for the reducer. Reducer starts a new reduce task when the next key in the sorted input data is different than the previous. Each reduce task takes key – value pairs as input and generates key-value pair as output.

      For more detail please followSorting in Hadoop

    • #5608
      DataFlair TeamDataFlair Team
      Spectator

      The output generated by the mapper are automatically sorted by MapReduce Framework, all intermediate key-value pairs in MapReduce that are generated by mapper get sorted by key.This saves time for the reducer.
      Shuffling and sorting in Hadoop MapReduce are not performed at all if you specify zero reducers (setNumReduceTasks(0)).

Viewing 3 reply threads
  • You must be logged in to reply to this topic.