What is the need of sorting in MapReduce

This topic has 3 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 3 reply threads

Author

Posts
- September 20, 2018 at 3:45 pm #5603
  
  DataFlair Team
  Spectator
  
  In MapReduce intermediate output is sorted by keys. What is the need of sorting in MapReduce flow ?
- September 20, 2018 at 3:45 pm #5605
  
  DataFlair Team
  Spectator
  
  Sort phase in MapReduce covers merging and sorting of map outputs. Map generates intermediate key-value pairs.All these intermediate key-value pairs are sorted by key. Each reduce task takes list of key-value pairs as input,but Sorting at mapper saves time for the reducer helping it to easily distinguish when a new reduce task should start.It simple starts a new reduce task,when next key in sorted input data is different than previous. If we want the sorted values then we can use the secondary sorting technique at the reducer.
  
  Follow the link to learn more about Sorting in Hadoop
- September 20, 2018 at 3:45 pm #5607
  
  DataFlair Team
  Spectator
  
  Sorting in Hadoop helps reducer to easily distinguish when a new reduce task should start, thus saves time for the reducer. Reducer starts a new reduce task when the next key in the sorted input data is different than the previous. Each reduce task takes key – value pairs as input and generates key-value pair as output.
  
  For more detail please followSorting in Hadoop
- September 20, 2018 at 3:45 pm #5608
  
  DataFlair Team
  Spectator
  
  The output generated by the mapper are automatically sorted by MapReduce Framework, all intermediate key-value pairs in MapReduce that are generated by mapper get sorted by key.This saves time for the reducer.
  Shuffling and sorting in Hadoop MapReduce are not performed at all if you specify zero reducers (setNumReduceTasks(0)).
Author

Posts

Viewing 3 reply threads

You must be logged in to reply to this topic.

What is the need of sorting in MapReduce

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses