The combiner in MapReduce is also known as ‘Mini-reducer’. The primary job of Combiner is to process the output data from the Mapper, before passing it to Reducer. It runs after the mapper and before the Reducer and its use is optional.
Theoretically speaking, it’s harmless to have more combiners till your function is cumulative and associative (like the word count example).
But think, if you have more number of combiners in a mapper node, isn’t the original purpose (reducing the volume of data sent to reducer) of combiner itself is defeated/compromised?
As individual combiners will receive data streams separately, hence the level of compression will not be as good as one combiner could have achieved.
Take the example of WordCount:-
In original InputSplit supplied to the mapper, let’s say, we have the word ”Hadoop” appearing 10 times, and you have 3 combiners let’s say (assuming that it’s possible):-
so the best combiner output can be:-
(”hadoop”, 4 )
however had we just one combiner output from this mapper node could have been
Moreover, just even going by the basic literal meaning of ”combiner”, for a single node it’s more intuitive to think it as just one instance.