Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › What is the need of sorting in MapReduce
- This topic has 3 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 3:45 pm #5603DataFlair TeamSpectator
In MapReduce intermediate output is sorted by keys. What is the need of sorting in MapReduce flow ?
-
September 20, 2018 at 3:45 pm #5605DataFlair TeamSpectator
Sort phase in MapReduce covers merging and sorting of map outputs. Map generates intermediate key-value pairs.All these intermediate key-value pairs are sorted by key. Each reduce task takes list of key-value pairs as input,but Sorting at mapper saves time for the reducer helping it to easily distinguish when a new reduce task should start.It simple starts a new reduce task,when next key in sorted input data is different than previous. If we want the sorted values then we can use the secondary sorting technique at the reducer.
Follow the link to learn more about Sorting in Hadoop
-
September 20, 2018 at 3:45 pm #5607DataFlair TeamSpectator
Sorting in Hadoop helps reducer to easily distinguish when a new reduce task should start, thus saves time for the reducer. Reducer starts a new reduce task when the next key in the sorted input data is different than the previous. Each reduce task takes key – value pairs as input and generates key-value pair as output.
For more detail please followSorting in Hadoop
-
September 20, 2018 at 3:45 pm #5608DataFlair TeamSpectator
The output generated by the mapper are automatically sorted by MapReduce Framework, all intermediate key-value pairs in MapReduce that are generated by mapper get sorted by key.This saves time for the reducer.
Shuffling and sorting in Hadoop MapReduce are not performed at all if you specify zero reducers (setNumReduceTasks(0)).
-
-
AuthorPosts
- You must be logged in to reply to this topic.