How many times combiner is called on a mapper node?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop How many times combiner is called on a mapper node?

Viewing 4 reply threads
  • Author
    Posts
    • #5780
      DataFlair TeamDataFlair Team
      Spectator

      How many times a combiner is called on a mapper node for a specific MapReduce Job ?

      Can we control the execution of a combiner? Can we specify how many combiners should run?

    • #5782
      DataFlair TeamDataFlair Team
      Spectator

      When we run MapReduce job on a large dataset, so Mapper generates large chunks of intermediate data and framework pass this intermediate data to the Reducer for further processing. This leads to enormous network congestion.

      The MapReduce framework provides a function known as Combiner that plays a vital role in reducing network congestion. The Combiner is also called as Mini-reducer.

      In MapReduce job, Combiner does local aggregation on the mapper output. This helps to minimize the data transfer between mapper and reducer. Therefore, increase the efficiency of a MapReduce program.
      The execution of combiner is not guaranteed; Hadoop may or may not execute a combiner. Also if required it may execute it more than 1 times. So, MapReduce jobs should not depend on the Combiners execution.

      Combiner can be executed zero, one or many times, so that a given mapreduce job should not depend on the combiner executions and should always produce the same results. It also depends on the size of particular intermediate map output and a value of min.num.spills.for.combine property
      By default 3 spill files produced by a mapper are needed for the combiner to run, if combiner is specified.

      Follow the link to learn more about Combiner in Hadoop

    • #5783
      DataFlair TeamDataFlair Team
      Spectator

      Combiner runs if the spills are greater than minSpillsForCombine. The minSpillForCombine is driven by property “mapreduce.map.combine.minspills” whose default value is 3.
      With default value 3, combiner only runs if there are more than 3 spill files written to the disk.
      Recall that combiners may be run repeatedly over the input without affecting the final result. If there are only one or two spills, then the potential reduction in map output size is not worth the overhead in invoking the combiner, so it is not run again for this map output.
      A combiner is never run for map-only jobs.
      We cannot hard code / direct to the framework for the number of the combiner to run for a given MapReduce job.

    • #5785
      DataFlair TeamDataFlair Team
      Spectator

      Invoking of a Combiner function totally depends on the size of the input file,larger the input data,large will be the intermediate output from Mapper.
      Now if this entire output will be send directly to reducer,it will take more time to process this large amount of data.
      So,basically combiner is invoked to reduce the intermediate output from mapper ,so that reducer has to process less data and thus give final output in less time.
      Now the no of combiners is not predefined.It can be 0 or multiple,depending on the size of data.

      Follow the link to learn more about Combiner in Hadoop

    • #5787
      DataFlair TeamDataFlair Team
      Spectator

      Combiner also termed as mini-reducer processes the intermediate output of the mapper before passing it to the reducer.
      When the MapReduce job runs on the large data set, a mapper generates a large chunk of intermediate output which is transferred to reducer over the network and causes a network congestion. Combiner helps in reducing the network congestion by reducing the number of data transferred to the reducer, Combiner takes input from the mapper and summarizes the output based on the key passes the resulting key-value pairs to the reducer.

      Worth to note below points about the combiner.
      1. The combiner is a class which doesn’t have its own interface and extends the reducer interface and overrides the reduce method.
      2. Combiner output key-value pair should be same as the Reducer.
      3. The execution of combiner is not guaranteed hence it can’t be predicted how many times a combiner will be invoked.
      4. Because of the obvious reasons mentioned above, Combiner won’t work in Map-only jobs.

Viewing 4 reply threads
  • You must be logged in to reply to this topic.