What happen if number of reducer is 0 in Hadoop?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop What happen if number of reducer is 0 in Hadoop?

Viewing 2 reply threads
  • Author
    Posts
    • #5812
      DataFlair TeamDataFlair Team
      Spectator

      When reducer is set to 0 in MapReduce? Why?

    • #5814
      DataFlair TeamDataFlair Team
      Spectator

      If we set the number of Reducer to 0 (by setting job.setNumreduceTasks(0)), then no reducer will execute and no aggregation will take place. In such case, we will prefer “Map-only job” in Hadoop.
      Map-Only job
      In Map-Only job, the map does all task with its InputSplit and the reducer do no job. Mapper output is the final output. Between map and reduce phases there is key, sort, and shuffle phase. Sort and shuffle phase are responsible for sorting the keys in ascending order.

      Then grouping values based on same keys. This phase is very expensive. If reduce phase is not required we should avoid it. Avoiding reduce phase would eliminate sort and shuffle phase as well. This also saves network congestion. As in shuffling an output of mapper travels to the reducer, when data size is huge, large data travel to the reducer.

      In MapReduce job, mapper output is written to local disk before sending to Reducer but in the map-only job, this output is directly written to HDFS. This further saves time and reduces cost as well.

      Follow the link to learn more about Reducer in Hadoop

    • #5815
      DataFlair TeamDataFlair Team
      Spectator

      The number of reducer can be set to 0 in driver class by job.setNumreduceTasks(0).This shows that there is no reducer phase and has only map phase.It is called as a map-only job.

      Map-only job:
      The map-only job has only map phase.The output of mapper stores directly on HDFS not on disk. The map output is final output.As it has no reducer phase, the aggregation and sorting is also not done.Generally, in map-reducer job the output after shuffling and sorting goes to the reducer, when the data is huge it needs good network bandwidth. As there is no shuffling and sorting in map-only job there will be less network congestion.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.