In MapReduce Flow, when Combiner is called ?

This topic has 3 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 3 reply threads

Author

Posts
- September 20, 2018 at 3:46 pm #5614
  
  DataFlair Team
  Spectator
  
  During MapReduce flow, when is combiner called ? is it called after mapper or after partitioner?
  What is the exact order of execution of combiner in MapReduce flow?
- September 20, 2018 at 3:46 pm #5615
  DataFlair Team
  Spectator
  If we have a look at the high level flow of Map Reduce, this should be the order of execution :
  1. Mapper
  2. Combiner
  3. Partitioner
  4. Shuffling/Sorting
  5. Reducer
  The combiner runs after mapper and before partitioner.
  The job of the combiner is to optimize the output of the mapper before its fed to the reducer in order to reduce the data size that is moved to the reducer.
  
  It cannot run after partitioner as the partitioner after processing the data sends it to the reducer. Also, the output of the partitioner is key and collection of values which is not meant to be given as input to the combiner.
  
  Follow the link to learn in deep about Data Flow in MapReduce
- September 20, 2018 at 3:47 pm #5617
  
  DataFlair Team
  Spectator
  
  Combiner can be viewed as mini-reducers in map phase.
  
  Purpose
  
  In Map Reduce framework, usually the output from the map tasks is large and data transfer between map and reduce tasks will be high. Since the data transfer across the network is expensive and to limit the volume of data transfer between map and reduce tasks.
  
  Combiner functions summarize the map output records with the same key and output of combiner will be sent over network to actual reduce task as input.
  
  The order of execution in map-reduce is,
  
  1.Mapper
  2.Combiner
  3.Partitioner
  4.Shuffling/Sorting
  5.Reducer
  
  Combiner is applied at the same machine as the map.It is called after mapper and before partitioner.
  
  Check this for more info about combiner :Combiner in Hadoop
- September 20, 2018 at 3:47 pm #5619
  
  DataFlair Team
  Spectator
  
  Combiner is called after mapper.
  
  Details: Combiner can be viewed as mini-reducers in the map phase. They perform a local-reduce on the mapper results before they are distributed further. Once the Combiner functionality is executed, it is then passed on to the Reducer for further work.
  
  where as
  
  The partitioner comes into the picture when we are working one more than on reducer. So, the partitioner decides which reducer is responsible for a particular key. They basically take the Mapper Result(if Combiner is used then Combiner Result) and send it to the responsible Reducer based on the key.
  
  Reference:
  https://developer.yahoo.com/hadoop/tutorial/module4.html
  https://stackoverflow.com/questions/22061210/what-runs-first-the-partitioner-or-the-combiner
  https://community.hortonworks.com/questions/14328/what-is-the-difference-between-partitioner-combine.html
Author

Posts

Viewing 3 reply threads

You must be logged in to reply to this topic.

In MapReduce Flow, when Combiner is called ?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses