What is Shuffling and Sorting in Hadoop MapReduce?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop What is Shuffling and Sorting in Hadoop MapReduce?

Viewing 3 reply threads
  • Author
    Posts
    • #6226
      DataFlair TeamDataFlair Team
      Spectator

      What do you mean by shuffling and sorting in MapReduce?

    • #6228
      DataFlair TeamDataFlair Team
      Spectator

      MapReduce is the processing framework of Hadoop. The processing takes place in two phase/ task MAP task where data is broken down into key-value pair blocks and REDUCE task where these blocks are modified based on the value of Key, i.e aggregation of data based on keys.

      Processing of Map and Reduce phase is done as parallel process,
      In map the input is split among the mapper nodes where each chunk is identified and mapped to the key forming a tuple(key-value) pair. These tuples are passed to Reducer nodes where sorting-shuffling of tuples takes place i.e. sorting and grouping tuples based on keys so that all tuples with the same key are sent to the same node.

      For more detail follow sorting-shuffling

    • #6230
      DataFlair TeamDataFlair Team
      Spectator

      If we go deep into MapReduce concepts we come across these terms like sorting-shuffling ,as you have come this far then you should also know that only particular key from all mappers goes to one reducer ,here this process is nothing but shuffling and then after that before the reducer starts its actual work(like aggregation) all the key,value pairs are sorted based on key .These sorting and shuffling part is taken care by hadoop framework.

    • #6232
      DataFlair TeamDataFlair Team
      Spectator

      Shuffling:

      The process of transferring data from the mappers to reducers is known as shuffling i.e.the process by which the system performs the sort and transfers the map output to the reducer as input. so shuffling is important as it saves time also as it can start as soon as one mapper is completed only and not necessary to wait for the complete completion of mappers.

      Sorting:

      The keys generated by the mapper are automatically sorted by MapReduce Framework, i.e. Before starting of reducer, all intermediate key-value pairs in MapReduce that are generated by mapper get sorted by key and not by value.sorting helps reducer to easily identify to when the new reducer should start and thus it save the time also.

      For more detail follow sorting-shuffling

Viewing 3 reply threads
  • You must be logged in to reply to this topic.