What is Combiner in Hadoop MapReduce

Viewing 4 reply threads
  • Author
    Posts
    • #5805
      DataFlair TeamDataFlair Team
      Spectator

      What is Combiner in Hadoop MapReduce?
      What is the need of combiner in MapReduce?
      What is the use of Combiner in Hadoop?
      How combiner is used to optimize MapReduce job?

    • #5807
      DataFlair TeamDataFlair Team
      Spectator

      Hadoop MapReduce produces large chunk of intermediate data in mapper phase before being sent to the reducer. When this large data is transferred from mapper to reducer it will take up lot of network resources. Solution for this is to use Combiner with Mappers which act as mini-reducer’s i.e. Combiner processes the o/p of Mapper and does local aggregation before passing it to the reducer. Hence reducing the load on reducer. Combiner and Reducer use same code only difference is that combiner works along with each mapper.

      As an example lets take part of data in
      Mapper1 ((car,1), (bike,1), (car,1))
      Mapper2 ((car,1),(bus,1),(bike,1),(bus,1)).

      When the combiner in Mapper1 run it will produce ((car,2), (bike,1)) and combiner in Mapper2 will produce ((car,1), (bus,2), (bike,1)). These o/P’s from Combiners are passed on to reducers.

      For more detail follow Combiner in Hadoop

    • #5810
      DataFlair TeamDataFlair Team
      Spectator

      In Hadoop MapReduce concept, we have a class in between Mapper and Reducer, called Combiner.
      When a MapReduce(MR) job is run on a large dataset, Map task generates huge chunks of intermediate data, which is passed on to Reduce task. During this phase, the output from Mapper has to travel over the network to the node where Reducer is running. This data movement may cause network congestion if the data is huge.

      To reduce this network congestion, MR framework provides a function called ‘Combiner’, which is also called as ‘Mini-Reducer’
      The role of Combiner is to take the output of Mapper as it’s input, process it and sends its output to the reducer. Combiner reads each key-value pair, combines all the values for the same key, and sends this as input to the reducer, which reduces the data movement in the network. Combiner works along with each Mapper.
      Combiner uses same class as Reducer.

      Example:
      Output from Mapper:
      <What,1> <do,1> <you,1> <mean,1> <by,1> <Object,1>
      <What,1> <do,1> <you,1> <know,1> <about,1> <Java,1>
      <What,1> <is,1> <Java,1> <Virtual,1> <Machine,1>
      <How,1> <Java,1> <enabled,1> <High,1> <Performance,1>
      The above key-value pairs are taken as input to the Combiner, which provides below output:
      <What,3> <do,2> <you,2> <mean,1> <by,1> <Object,1>
      <know,1> <about,1> <Java,3>
      <is,1> <Virtual,1> <Machine,1>
      <How,1> <enabled,1> <High,1> <Performance,1>
      The above output from Combiner is sent to the Reducer as its input.

      Follow the link to learn more about Combiner in Hadoop

    • #5811
      DataFlair TeamDataFlair Team
      Spectator

      The Combiner is a “mini-reduce” process which operates only on data generated by Mappers.
      It passes runs after the Mapper and before the Reducer. Usage of the Combiner is optional.Combiner class are run on every node that has run map tasks.

      The Combiner should be an instance of the Reducer interface.
      conf.setCombinerClass(Reduce.class);

      This is done so to decrease the network congestion.

      o/p of mapper1 | o/p of mapper2
      <India,1> | <India,1>
      <is,1> | <country,1>
      <my, 1> | <is,1>
      <India,1> | <is, 1>
      <Hind,1> | <Jai,1>
      o/p Combiner1 | o/p Combiner2

      <Hind,1> | <country,1>
      <India,2> | <India,1>
      <is,1> | <is,2>
      <my, 1> | <Jai,1>
      Reducer
      <country,1>
      <Hind,1>
      <India,3>
      <is,3>
      <Jai,1>
      <my, 1>
      Follow the link to learn more about Combiner in Hadoop

    • #5813
      DataFlair TeamDataFlair Team
      Spectator

      A Combiner runs between a Map and reduce Task, it is normally specified as a mini-reducer as it is also used for aggregation.
      A Combiner basically lessens intermediate Output keys that will be passed to the Reducer.
      It is basically used as an Optimization of a MapReduce Task.
      The Combiner should be an instance of the Reducer interface conf.setCombinerClass(Reduce.class)

      for example:
      if Output from Mapper 1 is:
      <Car,1> <star,1> <Car,1> <river,1> <star,1>

      output of combiner:
      <Car,2> <star,2> <river,1>

Viewing 4 reply threads
  • You must be logged in to reply to this topic.