This topic contains 3 replies, has 1 voice, and was last updated by  dfbdteam3 1 year, 6 months ago.

Viewing 4 posts - 1 through 4 (of 4 total)
  • Author
  • #5614


    During MapReduce flow, when is combiner called ? is it called after mapper or after partitioner?
    What is the exact order of execution of combiner in MapReduce flow?



    If we have a look at the high level flow of Map Reduce, this should be the order of execution :

    1. Mapper
    2. Combiner
    3. Partitioner
    4. Shuffling/Sorting
    5. Reducer

    The combiner runs after mapper and before partitioner.
    The job of the combiner is to optimize the output of the mapper before its fed to the reducer in order to reduce the data size that is moved to the reducer.

    It cannot run after partitioner as the partitioner after processing the data sends it to the reducer. Also, the output of the partitioner is key and collection of values which is not meant to be given as input to the combiner.

    Follow the link to learn in deep about Data Flow in MapReduce



    Combiner can be viewed as mini-reducers in map phase.


    In Map Reduce framework, usually the output from the map tasks is large and data transfer between map and reduce tasks will be high. Since the data transfer across the network is expensive and to limit the volume of data transfer between map and reduce tasks.

    Combiner functions summarize the map output records with the same key and output of combiner will be sent over network to actual reduce task as input.

    The order of execution in map-reduce is,


    Combiner is applied at the same machine as the map.It is called after mapper and before partitioner.

    Check this for more info about combiner :Combiner in Hadoop



    Combiner is called after mapper.

    Details: Combiner can be viewed as mini-reducers in the map phase. They perform a local-reduce on the mapper results before they are distributed further. Once the Combiner functionality is executed, it is then passed on to the Reducer for further work.

    where as

    The partitioner comes into the picture when we are working one more than on reducer. So, the partitioner decides which reducer is responsible for a particular key. They basically take the Mapper Result(if Combiner is used then Combiner Result) and send it to the responsible Reducer based on the key.


Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic.