What is combiner in MapReduce?

This topic has 3 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 3 reply threads

Author

Posts
- September 20, 2018 at 5:39 pm #6293
  
  DataFlair Team
  Spectator
  
  What is the need of combiner in Hadoop?
  What is the role of Combiner in Hadoop MapReduce?
- September 20, 2018 at 5:39 pm #6294
  
  DataFlair Team
  Spectator
  
  When we run the MapReduce job on very large data sets the mapper processes and produces large chunks of intermediate output data which is then send to Reducer which causes huge network congestion.
  To increase the efficiency users can optionally specify a Combiner , via Job.setCombinerClass(Reducer.class), to perform local aggregation of the intermediate outputs, which helps to cut down the amount of data transferred from the Mapper to the Reducer.
  
  Combiner acts as a mini-reducer. Combiner processes the output of Mapper and does local aggregation before passing it to the reducer.
  
  Example:
  
  Mapper 1 = (Min, 1), (is, 1) , (Max, 1), (is, 1), (Min, 1), (is, 1), (Max, 1), (is, 1)
  Mapper 2 = (Temperature, 1), (is, 1), (Temperature, 1), (is, 1)
  
  Shuffle & Sort 1 = (is, 1,1,1,1), (Min, 1,1), (Max, 1,1)
  Shuffle & Sort 2 = (is, 1,1), (Temperature, 1,1)
  
  Combiner 1 = (is, 4), (Min, 2), (Max, 2),
  Combiner 2 = (is, 2), (Temperature, 2)
  
  Reducer = (is, 6), (Min, 2), (Max, 2),(Temperature, 2),
  
  Advantages of Combiner:
  1. It reduces the time taken for data transfer between mapper and reducer.
  2. It decreases the amount of data that needed to be processed by the reducer.
  3. The overall performance of the reducer is improved by the combiner.
  
  To learn more about the Combiner follow: Combiner Tutorial
- September 20, 2018 at 5:39 pm #6295
  
  DataFlair Team
  Spectator
  
  Combiner is an optional class which is sometime called Semi/ mini-reducer. This is because the combiner
  implements the reducer interface method [ Job.setCombinerClass(Reducer.class) ].
  
  Significance of combiner is to reduce the network conjunction while processing large datasets.
  
  1. The intermediate output from the Mappers in Hadoop are sent to combiner
  2. Reducer operation (like Aggregation) is performed on the values with same key for each Mapper output.
  3. Output of combiner is sent to Reducer for further processing.
  
  Since the summarized output is given to the Reducer instead of complete large intermediate output the expensive data transfer over the network can be reduced thus increasing performance.
  
  Follow the link for more detail: Combiner
- September 20, 2018 at 5:39 pm #6297
  
  DataFlair Team
  Spectator
  
  The Combiner is a Reducer for an input split . Let’s understand this
  The job of a Mappers is to split the input split into key value pair. Suppose we have 10 input split and each input split breaks into 20 key-value pairs. So we have total 200 key-value pair that has to be copied to the Reducer. The reducer will reduce this 200 pairs to very less no of key value pairs. We need to transfer 200 key value pair to reducer node. In this case all the input data that has to be reduced came to reducer.
  
  If we use a combiner then at the time of writing key value pair by a mapper for input split is reduced by the combiner logic then it writes very less number of key value pair in the local disk as compared to without using combiner. In this case, there is two level of reducing 1. After processing the input split. 2. Processed output of all input splits aggregated(shuffled and sorted).
  
  How to set combiner class
  job.setCombinerClass(Reducer.class)
  
  here job is the instance of a org.apache.hadoop.mapreduce.Job.
  Generally, we use reducer class as combiner class but we can define specific combiner class also.
Author

Posts

Viewing 3 reply threads

You must be logged in to reply to this topic.

What is combiner in MapReduce?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses