What is combiner in MapReduce?

Viewing 3 reply threads
  • Author
    Posts
    • #6293
      DataFlair TeamDataFlair Team
      Spectator

      What is the need of combiner in Hadoop?
      What is the role of Combiner in Hadoop MapReduce?

    • #6294
      DataFlair TeamDataFlair Team
      Spectator

      When we run the MapReduce job on very large data sets the mapper processes and produces large chunks of intermediate output data which is then send to Reducer which causes huge network congestion.
      To increase the efficiency users can optionally specify a Combiner , via Job.setCombinerClass(Reducer.class), to perform local aggregation of the intermediate outputs, which helps to cut down the amount of data transferred from the Mapper to the Reducer.

      Combiner acts as a mini-reducer. Combiner processes the output of Mapper and does local aggregation before passing it to the reducer.

      Example:

      Mapper 1 = (Min, 1), (is, 1) , (Max, 1), (is, 1), (Min, 1), (is, 1), (Max, 1), (is, 1)
      Mapper 2 = (Temperature, 1), (is, 1), (Temperature, 1), (is, 1)

      Shuffle & Sort 1 = (is, 1,1,1,1), (Min, 1,1), (Max, 1,1)
      Shuffle & Sort 2 = (is, 1,1), (Temperature, 1,1)

      Combiner 1 = (is, 4), (Min, 2), (Max, 2),
      Combiner 2 = (is, 2), (Temperature, 2)

      Reducer = (is, 6), (Min, 2), (Max, 2),(Temperature, 2),

      Advantages of Combiner:
      1. It reduces the time taken for data transfer between mapper and reducer.
      2. It decreases the amount of data that needed to be processed by the reducer.
      3. The overall performance of the reducer is improved by the combiner.

      To learn more about the Combiner follow: Combiner Tutorial

    • #6295
      DataFlair TeamDataFlair Team
      Spectator

      Combiner is an optional class which is sometime called Semi/ mini-reducer. This is because the combiner
      implements the reducer interface method [ Job.setCombinerClass(Reducer.class) ].

      Significance of combiner is to reduce the network conjunction while processing large datasets.

      1. The intermediate output from the Mappers in Hadoop are sent to combiner
      2. Reducer operation (like Aggregation) is performed on the values with same key for each Mapper output.
      3. Output of combiner is sent to Reducer for further processing.

      Since the summarized output is given to the Reducer instead of complete large intermediate output the expensive data transfer over the network can be reduced thus increasing performance.

      Follow the link for more detail: Combiner

    • #6297
      DataFlair TeamDataFlair Team
      Spectator

      The Combiner is a Reducer for an input split . Let’s understand this
      The job of a Mappers is to split the input split into key value pair. Suppose we have 10 input split and each input split breaks into 20 key-value pairs. So we have total 200 key-value pair that has to be copied to the Reducer. The reducer will reduce this 200 pairs to very less no of key value pairs. We need to transfer 200 key value pair to reducer node. In this case all the input data that has to be reduced came to reducer.

      If we use a combiner then at the time of writing key value pair by a mapper for input split is reduced by the combiner logic then it writes very less number of key value pair in the local disk as compared to without using combiner. In this case, there is two level of reducing 1. After processing the input split. 2. Processed output of all input splits aggregated(shuffled and sorted).

      How to set combiner class
      job.setCombinerClass(Reducer.class)

      here job is the instance of a org.apache.hadoop.mapreduce.Job.
      Generally, we use reducer class as combiner class but we can define specific combiner class also.

Viewing 3 reply threads
  • You must be logged in to reply to this topic.