How many Reducers run for a MapReduce job?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop How many Reducers run for a MapReduce job?

Viewing 2 reply threads
  • Author
    Posts
    • #6274
      DataFlair TeamDataFlair Team
      Spectator

      When we submit a MapReduce job how many reduce tasks run in Hadoop?
      How to calculate the number of Reducers in Hadoop?
      How to set no of Reducers for a MapReduce job?

    • #6275
      DataFlair TeamDataFlair Team
      Spectator

      1. Reducer takes a set of an intermediate key-value pair produced by the Mapper as the input and runs a Reducer function on each of them. It produces a new set of output (zero or more key-value pair) which will be stored in hdfs.
      2. This data (key, value) can be aggregated, filtered, and combined in a number of ways, and it requires a wide range of processing.
      3. One to one mapping takes place between keys and reducers. Reducers run in parallel since they are independent of one another. The user decides the number of reducers. By default number of reducers is 1

      Phases:
      Shuffle phase: The sorted output from the mapper is the input to the Reducer. In this phase, with the help of HTTP, the framework fetches the relevant partition of the output of all the mappers.
      Sort phase: Input from different mappers is again sorted based on the similar keys in different Mappers. The shuffle and sort phases occur parallelly.
      Reduce phase, after shuffling and sorting, reduce task aggregates the key value pairs.
      4. By OutputCollector.collect(), the output of the reduce task is written to the Filesystem. Reducer output is not sorted.

      How many Reducers in Hadoop:
      Job.setNumreduceTasks(int) the user set the number of reducers for the job.
      The right number of reducers are 0.95 or 1.75 multiplied by (<no. of nodes> * <no. of the maximum container per node>).
      With 0.95, all reducers immediately launch and start transferring map outputs as the maps finish.
      With 1.75, the first round of reducers is finished by the faster nodes and second trend of reducers is launched doing a much better job of load balancing.

    • #6278
      DataFlair TeamDataFlair Team
      Spectator

      Reducer reduces a set of intermediate output values from the Mapper, which share a key with all the associated values.

      In a MapReduce job, the number of Reducers running will be the number of reduce tasks set by the user. Ideally the number of reducers set must be:

      0.95 or 1.75 multiplied by (<no. of nodes> * <no. of maximum containers per node>)

      .With the value 0.95, all the reducers can launch immediately (parallel to the mappers) and start transferring map outputs as the map tasks finish.
      With the value 1.75, the faster nodes will finish their first round of reducers and launch a second set of reducers, thereby doing a much better job of load balancing.
      Increasing the number of reduces increases the framework overhead, but also increases load balancing and lowers the cost of failures.

      The number of reducers can be set in two ways as below:

      1. Using the command line: While running the MapReduce job, we have an option to set the number of reducers which can be specified by the controller mapred.reduce.tasks.
        For example,

        jar word_count.jar com.home.wc.WordCount /input /output \ -D mapred.reduce.tasks = 20

        This will set the maximum reducers to 20.

      2. Using the JobConf instance: In the driver class of the MapReduce program, we can specify the number of reducers using the instance of Job configuration using the call, job.setNumReduceTasks(int). For example,

        Job.setNumReduceTasks(0)

      We can also set the number of reduce tasks to ‘0’ in case we need only a Map only job.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.