How many number of reducers run when we submit Map-Reduce Job

This topic has 1 reply, 1 voice, and was last updated 7 years, 10 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 2:35 pm #5213
  
  DataFlair Team
  Spectator
  
  How many number of reducers run when we submit Map-Reduce Job ?
  Can we control no of reducers ?
  What is the best practices to choose no of reducers to get best performance ?
- September 20, 2018 at 2:35 pm #5216
  DataFlair Team
  Spectator
  In Hadoop MapReduce, Mapper processes each input record (from RecordReader ) and generates key-value pairs. Reducer takes a set of an intermediate key-value pair generated by Mapper as input and runs a reduce function on each of them to generate output. Reduceroutput is the final output, which is stored in HDFS . Reducer performs aggregation/summation sort of computation.
  
  With the help of Job.setNumreduceTasks (int) the user set the number of reducers for the job. The right number of reducers is calculated by:
  
  0.95 or 1.75 multiplied by (<no. of nodes>*<no. of maximum container per node>)
  As the map finishes, with 0.95 all the reduces can launch immediately and start transferring map outputs. Faster nodes will finish the first round of reduces with 0.75 and launch the second wave of reduces which do much better job of load balancing.
  
  When Hadoop framework increases reducers then:
  - Framework overhead increases.
  - Load balancing increases.
  - The cost of failures decreases.
  To study in detail please follow: Reducer in Hadoop
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.