What is difference between reducer and combiner?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop What is difference between reducer and combiner?

Viewing 3 reply threads
  • Author
    Posts
    • #5096
      DataFlair TeamDataFlair Team
      Spectator

      Explain the difference between combiner and reducer. Can we replace reducer with combiner or vice-a-versa?

    • #5097
      DataFlair TeamDataFlair Team
      Spectator

      Combiner in Hadoop is a mini reducer that performs the local reduce task. Many MapReduce jobs are limited by the network bandwidth available on the cluster, so the combiner minimizes the data transferred between map and reduce tasks. Combiner function will run on the Map output and combiner’s output is given to Reducers as input. In one word Combiner function is used for network optimization.
      If the map generates more number of outputs as per requirement, then we need to use combiner but:

      • Unlike a Reducer, input/output key and value types of combiner must match the output types of your Mapper .
      • Combiners can only be used on the functions that are commutative (a.b = b.a) and associative {a.(b.c) = (a.b).c} . From this, we can say that combiner may operate only on a subset of your keys and values. Or may does not execute at all, still, you want the output of the program to remain same.
      • From multiple Mappers, Reducer get its input data as part of the partitioning process. Combiners can only get its input from one Mapper.

      Combiner function is used as per requirements, it has not replaced Reducer. The execution of combiner is not guaranteed, it may be called 0, 1 or more times.

    • #5098
      DataFlair TeamDataFlair Team
      Spectator

      Combiner: it operates by accepting key/value from map class and passing output to reducer class. Its function is to summarize the map output records with the same key. The output (key-value) of the combiner will be sent Reducer task as input.

      Reducer: it takes to set of intermediate key-value pair generated by the mapper and then runs a reduce function on each of them to generate the output. The output of the reducer is the final output, which is stored in HDFS.

      Difference :

      If there multiple mappers reducer will get data as part of partitioning process, but combiners will only get input from one mapper.

      Input key value types of combiner must match with the output type of mapper. This is not the case with Reducer.

      In case of no reducer, the job stop at the map phase. It also omits the local sorting on the outputs. This sorting is required by the combiner (they are actually local reducers).

    • #5099
      DataFlair TeamDataFlair Team
      Spectator

      The main use of combiner is to reduce the number of key value pairs which is passed from mapper to reducer so that network traffic can be reduced.

      Combiner takes output of mapper as its input i,e output key and value type of mapper must be same as combiner’s input key and value type.There is one combiner for each mapper.

      But reducer’s input key and value type is not same as mapper output type and by default there is one reducer for all the mappers which takes values from all the mappers and do some aggregation and summation on the inputs.

      Combiners execute whenever there is requirement ,its not compulsory that combiners will execute every time but in case reducers they execute everytime unless it is map only job.

Viewing 3 reply threads
  • You must be logged in to reply to this topic.