What is difference between reducer and combiner?

This topic has 3 replies, 1 voice, and was last updated 5 years, 6 months ago by DataFlair Team.

Viewing 3 reply threads

Author

Posts
- September 20, 2018 at 2:14 pm #5096
  
  DataFlair Team
  Spectator
  
  Explain the difference between combiner and reducer. Can we replace reducer with combiner or vice-a-versa?
- September 20, 2018 at 2:15 pm #5097
  DataFlair Team
  Spectator
  A Combiner in Hadoop is a mini reducer that performs the local reduce task. Many MapReduce jobs are limited by the network bandwidth available on the cluster, so the combiner minimizes the data transferred between map and reduce tasks. Combiner function will run on the Map output and combiner’s output is given to Reducers as input. In one word Combiner function is used for network optimization.
  If the map generates more number of outputs as per requirement, then we need to use combiner but:
  - Unlike a Reducer, input/output key and value types of combiner must match the output types of your Mapper .
  - Combiners can only be used on the functions that are commutative (a.b = b.a) and associative {a.(b.c) = (a.b).c} . From this, we can say that combiner may operate only on a subset of your keys and values. Or may does not execute at all, still, you want the output of the program to remain same.
  - From multiple Mappers, Reducer get its input data as part of the partitioning process. Combiners can only get its input from one Mapper.
  Combiner function is used as per requirements, it has not replaced Reducer. The execution of combiner is not guaranteed, it may be called 0, 1 or more times.
- September 20, 2018 at 2:15 pm #5098
  
  DataFlair Team
  Spectator
  
  Combiner: it operates by accepting key/value from map class and passing output to reducer class. Its function is to summarize the map output records with the same key. The output (key-value) of the combiner will be sent Reducer task as input.
  
  Reducer: it takes to set of intermediate key-value pair generated by the mapper and then runs a reduce function on each of them to generate the output. The output of the reducer is the final output, which is stored in HDFS.
  
  Difference :
  
  If there multiple mappers reducer will get data as part of partitioning process, but combiners will only get input from one mapper.
  
  Input key value types of combiner must match with the output type of mapper. This is not the case with Reducer.
  
  In case of no reducer, the job stop at the map phase. It also omits the local sorting on the outputs. This sorting is required by the combiner (they are actually local reducers).
- September 20, 2018 at 2:15 pm #5099
  
  DataFlair Team
  Spectator
  
  The main use of combiner is to reduce the number of key value pairs which is passed from mapper to reducer so that network traffic can be reduced.
  
  Combiner takes output of mapper as its input i,e output key and value type of mapper must be same as combiner’s input key and value type.There is one combiner for each mapper.
  
  But reducer’s input key and value type is not same as mapper output type and by default there is one reducer for all the mappers which takes values from all the mappers and do some aggregation and summation on the inputs.
  
  Combiners execute whenever there is requirement ,its not compulsory that combiners will execute every time but in case reducers they execute everytime unless it is map only job.
Author

Posts

Viewing 3 reply threads

You must be logged in to reply to this topic.

What is difference between reducer and combiner?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses