This topic contains 3 replies, has 1 voice, and was last updated by  dfbdteam3 1 year, 6 months ago.

Viewing 4 posts - 1 through 4 (of 4 total)
  • Author
    Posts
  • #5614

    dfbdteam3
    Moderator

    During MapReduce flow, when is combiner called ? is it called after mapper or after partitioner?
    What is the exact order of execution of combiner in MapReduce flow?

    #5615

    dfbdteam3
    Moderator

    If we have a look at the high level flow of Map Reduce, this should be the order of execution :

    1. Mapper
    2. Combiner
    3. Partitioner
    4. Shuffling/Sorting
    5. Reducer

    The combiner runs after mapper and before partitioner.
    The job of the combiner is to optimize the output of the mapper before its fed to the reducer in order to reduce the data size that is moved to the reducer.

    It cannot run after partitioner as the partitioner after processing the data sends it to the reducer. Also, the output of the partitioner is key and collection of values which is not meant to be given as input to the combiner.

    Follow the link to learn in deep about Data Flow in MapReduce

    #5617

    dfbdteam3
    Moderator

    Combiner can be viewed as mini-reducers in map phase.

    Purpose

    In Map Reduce framework, usually the output from the map tasks is large and data transfer between map and reduce tasks will be high. Since the data transfer across the network is expensive and to limit the volume of data transfer between map and reduce tasks.

    Combiner functions summarize the map output records with the same key and output of combiner will be sent over network to actual reduce task as input.

    The order of execution in map-reduce is,

    1.Mapper
    2.Combiner
    3.Partitioner
    4.Shuffling/Sorting
    5.Reducer

    Combiner is applied at the same machine as the map.It is called after mapper and before partitioner.

    Check this for more info about combiner :Combiner in Hadoop

    #5619

    dfbdteam3
    Moderator

    Combiner is called after mapper.

    Details: Combiner can be viewed as mini-reducers in the map phase. They perform a local-reduce on the mapper results before they are distributed further. Once the Combiner functionality is executed, it is then passed on to the Reducer for further work.

    where as

    The partitioner comes into the picture when we are working one more than on reducer. So, the partitioner decides which reducer is responsible for a particular key. They basically take the Mapper Result(if Combiner is used then Combiner Result) and send it to the responsible Reducer based on the key.

    Reference:
    https://developer.yahoo.com/hadoop/tutorial/module4.html
    https://stackoverflow.com/questions/22061210/what-runs-first-the-partitioner-or-the-combiner
    https://community.hortonworks.com/questions/14328/what-is-the-difference-between-partitioner-combine.html

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic.