What is Combiner in Hadoop MapReduce

This topic has 4 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 4 reply threads

Author

Posts
- September 20, 2018 at 4:16 pm #5805
  
  DataFlair Team
  Spectator
  
  What is Combiner in Hadoop MapReduce?
  What is the need of combiner in MapReduce?
  What is the use of Combiner in Hadoop?
  How combiner is used to optimize MapReduce job?
- September 20, 2018 at 4:16 pm #5807
  
  DataFlair Team
  Spectator
  
  Hadoop MapReduce produces large chunk of intermediate data in mapper phase before being sent to the reducer. When this large data is transferred from mapper to reducer it will take up lot of network resources. Solution for this is to use Combiner with Mappers which act as mini-reducer’s i.e. Combiner processes the o/p of Mapper and does local aggregation before passing it to the reducer. Hence reducing the load on reducer. Combiner and Reducer use same code only difference is that combiner works along with each mapper.
  
  As an example lets take part of data in
  Mapper1 ((car,1), (bike,1), (car,1))
  Mapper2 ((car,1),(bus,1),(bike,1),(bus,1)).
  
  When the combiner in Mapper1 run it will produce ((car,2), (bike,1)) and combiner in Mapper2 will produce ((car,1), (bus,2), (bike,1)). These o/P’s from Combiners are passed on to reducers.
  
  For more detail follow Combiner in Hadoop
- September 20, 2018 at 4:16 pm #5810
  
  DataFlair Team
  Spectator
  
  In Hadoop MapReduce concept, we have a class in between Mapper and Reducer, called Combiner.
  When a MapReduce(MR) job is run on a large dataset, Map task generates huge chunks of intermediate data, which is passed on to Reduce task. During this phase, the output from Mapper has to travel over the network to the node where Reducer is running. This data movement may cause network congestion if the data is huge.
  
  To reduce this network congestion, MR framework provides a function called ‘Combiner’, which is also called as ‘Mini-Reducer’
  The role of Combiner is to take the output of Mapper as it’s input, process it and sends its output to the reducer. Combiner reads each key-value pair, combines all the values for the same key, and sends this as input to the reducer, which reduces the data movement in the network. Combiner works along with each Mapper.
  Combiner uses same class as Reducer.
  
  Example:
  Output from Mapper:
  <What,1> <do,1> <you,1> <mean,1> <by,1> <Object,1>
  <What,1> <do,1> <you,1> <know,1> <about,1> <Java,1>
  <What,1> <is,1> <Java,1> <Virtual,1> <Machine,1>
  <How,1> <Java,1> <enabled,1> <High,1> <Performance,1>
  The above key-value pairs are taken as input to the Combiner, which provides below output:
  <What,3> <do,2> <you,2> <mean,1> <by,1> <Object,1>
  <know,1> <about,1> <Java,3>
  <is,1> <Virtual,1> <Machine,1>
  <How,1> <enabled,1> <High,1> <Performance,1>
  The above output from Combiner is sent to the Reducer as its input.
  
  Follow the link to learn more about Combiner in Hadoop
- September 20, 2018 at 4:16 pm #5811
  
  DataFlair Team
  Spectator
  
  The Combiner is a “mini-reduce” process which operates only on data generated by Mappers.
  It passes runs after the Mapper and before the Reducer. Usage of the Combiner is optional.Combiner class are run on every node that has run map tasks.
  
  The Combiner should be an instance of the Reducer interface.
  conf.setCombinerClass(Reduce.class);
  
  This is done so to decrease the network congestion.
  
  o/p of mapper1 | o/p of mapper2
  <India,1> | <India,1>
  <is,1> | <country,1>
  <my, 1> | <is,1>
  <India,1> | <is, 1>
  <Hind,1> | <Jai,1>
  o/p Combiner1 | o/p Combiner2
  
  <Hind,1> | <country,1>
  <India,2> | <India,1>
  <is,1> | <is,2>
  <my, 1> | <Jai,1>
  Reducer
  <country,1>
  <Hind,1>
  <India,3>
  <is,3>
  <Jai,1>
  <my, 1>
  Follow the link to learn more about Combiner in Hadoop
- September 20, 2018 at 4:16 pm #5813
  
  DataFlair Team
  Spectator
  
  A Combiner runs between a Map and reduce Task, it is normally specified as a mini-reducer as it is also used for aggregation.
  A Combiner basically lessens intermediate Output keys that will be passed to the Reducer.
  It is basically used as an Optimization of a MapReduce Task.
  The Combiner should be an instance of the Reducer interface conf.setCombinerClass(Reduce.class)
  
  for example:
  if Output from Mapper 1 is:
  <Car,1> <star,1> <Car,1> <river,1> <star,1>
  
  output of combiner:
  <Car,2> <star,2> <river,1>
Author

Posts

Viewing 4 reply threads

You must be logged in to reply to this topic.

What is Combiner in Hadoop MapReduce

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses