What is Shuffling and Sorting in Hadoop MapReduce?

This topic has 3 replies, 1 voice, and was last updated 7 years, 8 months ago by DataFlair Team.

Viewing 3 reply threads

Author

Posts
- September 20, 2018 at 5:26 pm #6234
  
  DataFlair Team
  Spectator
  
  What do you mean by shuffling and sorting in MapReduce?
- September 20, 2018 at 5:27 pm #6236
  
  DataFlair Team
  Spectator
  
  MapReduce is the processing framework of Hadoop. The processing takes place in two phase/ task MAP task where data is broken down into key-value pair blocks and REDUCE task where these blocks are modified based on the value of Key, i.e aggregation of data based on keys.
  
  Processing of Map and Reduce phase is done as parallel process,
  In map the input is split among the mapper nodes where each chunk is identified and mapped to the key forming a tuple(key-value) pair. These tuples are passed to Reducer nodes where sorting-shuffling of tuples takes place i.e. sorting and grouping tuples based on keys so that all tuples with the same key are sent to the same node.
  
  For more detail follow sorting-shuffling
- September 20, 2018 at 5:27 pm #6238
  
  DataFlair Team
  Spectator
  
  If we go deep into MapReduce concepts we come across these terms like sorting-shuffling ,as you have come this far then you should also know that only particular key from all mappers goes to one reducer ,here this process is nothing but shuffling and then after that before the reducer starts its actual work(like aggregation) all the key,value pairs are sorted based on key .These sorting and shuffling part is taken care by hadoop framework.
- September 20, 2018 at 5:27 pm #6239
  
  DataFlair Team
  Spectator
  
  Shuffling:
  
  The process of transferring data from the mappers to reducers is known as shuffling i.e.the process by which the system performs the sort and transfers the map output to the reducer as input. so shuffling is important as it saves time also as it can start as soon as one mapper is completed only and not necessary to wait for the complete completion of mappers.
  
  Sorting:
  
  The keys generated by the mapper are automatically sorted by MapReduce Framework, i.e. Before starting of reducer, all intermediate key-value pairs in MapReduce that are generated by mapper get sorted by key and not by value.sorting helps reducer to easily identify to when the new reducer should start and thus it save the time also.
  
  For more detail follow sorting-shuffling
Author

Posts

Viewing 3 reply threads

You must be logged in to reply to this topic.

What is Shuffling and Sorting in Hadoop MapReduce?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses