how to configure number of combiner in MapReduce

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 11:31 am #4653
  
  DataFlair Team
  Spectator
  
  How to configure number of combiner in MapReduce ?
  
  Is it controllable ?
  
  If yes, How the No of combiners are defined.
  
  If no, how does framework takes care of No of combiner ?
- September 20, 2018 at 11:31 am #4654
  
  DataFlair Team
  Spectator
  
  Combiner
  In Hadoop, when we run MapReduce job on the large dataset, so large chunks of intermediate data is generated by the Mapper and this intermediate data is passed to the reducer for further processing, which leads to enormous network congestion. MapReduce framework provides a function known as Combiner that plays a key role in reducing network congestion.
  The Combiner is also known as Mini-reducer. Combiner performs local aggregation on the mappers output, which helps to minimize the data transfer between mapper and reducer. Thus, increase the efficiency of a MapReduce program.
  The execution of combiner is not guaranteed; Hadoop may or may not execute a combiner. Also if required it may execute it more than 1 times. Hence, your MapReduce jobs should not depend on the Combiners execution.
  Number of Combiner
  We cannot hardcode/direct to the framework for the number of the combiner to run for a given MapReduce job as we do it for Reducer.
  
  Combiner runs on each mapper node. The first rule of MapReduce Combiners is: do not assume that the combiner will run and treat the combiner only as an optimization technique.
  
  Each map task has a memory buffer that it writes the output to. The buffer is 100 MB by default and when the contents of the buffer reach a certain threshold size say 80%, a background process will start to spill the contents to disk. In some cases when the data doesn’t need to be spilled to disk, MapReduce will skip using the Combiner entirely. Note also that the Combiner may be run multiple times over subsets of the data without affecting the final result.
  
  By default, if there are at least 3 spill files the combiner will execute.
  Through a (min.num.spills.for.combine) property the number of spills for which a combiner need to run can be set.
  
  To learn more about the Combiner follow: Combiner Tutorial
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

how to configure number of combiner in MapReduce

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses