In MapReduce Flow, when Combiner is called ?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop In MapReduce Flow, when Combiner is called ?

Viewing 3 reply threads
  • Author
    Posts
    • #5614
      DataFlair TeamDataFlair Team
      Spectator

      During MapReduce flow, when is combiner called ? is it called after mapper or after partitioner?
      What is the exact order of execution of combiner in MapReduce flow?

    • #5615
      DataFlair TeamDataFlair Team
      Spectator

      If we have a look at the high level flow of Map Reduce, this should be the order of execution :

      1. Mapper
      2. Combiner
      3. Partitioner
      4. Shuffling/Sorting
      5. Reducer

      The combiner runs after mapper and before partitioner.
      The job of the combiner is to optimize the output of the mapper before its fed to the reducer in order to reduce the data size that is moved to the reducer.

      It cannot run after partitioner as the partitioner after processing the data sends it to the reducer. Also, the output of the partitioner is key and collection of values which is not meant to be given as input to the combiner.

      Follow the link to learn in deep about Data Flow in MapReduce

    • #5617
      DataFlair TeamDataFlair Team
      Spectator

      Combiner can be viewed as mini-reducers in map phase.

      Purpose

      In Map Reduce framework, usually the output from the map tasks is large and data transfer between map and reduce tasks will be high. Since the data transfer across the network is expensive and to limit the volume of data transfer between map and reduce tasks.

      Combiner functions summarize the map output records with the same key and output of combiner will be sent over network to actual reduce task as input.

      The order of execution in map-reduce is,

      1.Mapper
      2.Combiner
      3.Partitioner
      4.Shuffling/Sorting
      5.Reducer

      Combiner is applied at the same machine as the map.It is called after mapper and before partitioner.

      Check this for more info about combiner :Combiner in Hadoop

    • #5619
      DataFlair TeamDataFlair Team
      Spectator

      Combiner is called after mapper.

      Details: Combiner can be viewed as mini-reducers in the map phase. They perform a local-reduce on the mapper results before they are distributed further. Once the Combiner functionality is executed, it is then passed on to the Reducer for further work.

      where as

      The partitioner comes into the picture when we are working one more than on reducer. So, the partitioner decides which reducer is responsible for a particular key. They basically take the Mapper Result(if Combiner is used then Combiner Result) and send it to the responsible Reducer based on the key.

      Reference:
      https://developer.yahoo.com/hadoop/tutorial/module4.html
      https://stackoverflow.com/questions/22061210/what-runs-first-the-partitioner-or-the-combiner
      https://community.hortonworks.com/questions/14328/what-is-the-difference-between-partitioner-combine.html

Viewing 3 reply threads
  • You must be logged in to reply to this topic.