Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › In MapReduce Flow, when Combiner is called ?
- This topic has 3 replies, 1 voice, and was last updated 5 years, 5 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 3:46 pm #5614DataFlair TeamSpectator
During MapReduce flow, when is combiner called ? is it called after mapper or after partitioner?
What is the exact order of execution of combiner in MapReduce flow? -
September 20, 2018 at 3:46 pm #5615DataFlair TeamSpectator
If we have a look at the high level flow of Map Reduce, this should be the order of execution :
The combiner runs after mapper and before partitioner.
The job of the combiner is to optimize the output of the mapper before its fed to the reducer in order to reduce the data size that is moved to the reducer.It cannot run after partitioner as the partitioner after processing the data sends it to the reducer. Also, the output of the partitioner is key and collection of values which is not meant to be given as input to the combiner.
Follow the link to learn in deep about Data Flow in MapReduce
-
September 20, 2018 at 3:47 pm #5617DataFlair TeamSpectator
Combiner can be viewed as mini-reducers in map phase.
Purpose
In Map Reduce framework, usually the output from the map tasks is large and data transfer between map and reduce tasks will be high. Since the data transfer across the network is expensive and to limit the volume of data transfer between map and reduce tasks.
Combiner functions summarize the map output records with the same key and output of combiner will be sent over network to actual reduce task as input.
The order of execution in map-reduce is,
1.Mapper
2.Combiner
3.Partitioner
4.Shuffling/Sorting
5.ReducerCombiner is applied at the same machine as the map.It is called after mapper and before partitioner.
Check this for more info about combiner :Combiner in Hadoop
-
September 20, 2018 at 3:47 pm #5619DataFlair TeamSpectator
Combiner is called after mapper.
Details: Combiner can be viewed as mini-reducers in the map phase. They perform a local-reduce on the mapper results before they are distributed further. Once the Combiner functionality is executed, it is then passed on to the Reducer for further work.
where as
The partitioner comes into the picture when we are working one more than on reducer. So, the partitioner decides which reducer is responsible for a particular key. They basically take the Mapper Result(if Combiner is used then Combiner Result) and send it to the responsible Reducer based on the key.
Reference:
https://developer.yahoo.com/hadoop/tutorial/module4.html
https://stackoverflow.com/questions/22061210/what-runs-first-the-partitioner-or-the-combiner
https://community.hortonworks.com/questions/14328/what-is-the-difference-between-partitioner-combine.html
-
-
AuthorPosts
- You must be logged in to reply to this topic.