What happen if number of reducer is 0 in Hadoop?

This topic has 2 replies, 1 voice, and was last updated 7 years, 10 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 4:16 pm #5812
  
  DataFlair Team
  Spectator
  
  When reducer is set to 0 in MapReduce? Why?
- September 20, 2018 at 4:17 pm #5814
  
  DataFlair Team
  Spectator
  
  If we set the number of Reducer to 0 (by setting job.setNumreduceTasks(0)), then no reducer will execute and no aggregation will take place. In such case, we will prefer “Map-only job” in Hadoop.
  Map-Only job–
  In Map-Only job, the map does all task with its InputSplit and the reducer do no job. Mapper output is the final output. Between map and reduce phases there is key, sort, and shuffle phase. Sort and shuffle phase are responsible for sorting the keys in ascending order.
  
  Then grouping values based on same keys. This phase is very expensive. If reduce phase is not required we should avoid it. Avoiding reduce phase would eliminate sort and shuffle phase as well. This also saves network congestion. As in shuffling an output of mapper travels to the reducer, when data size is huge, large data travel to the reducer.
  
  In MapReduce job, mapper output is written to local disk before sending to Reducer but in the map-only job, this output is directly written to HDFS. This further saves time and reduces cost as well.
  
  Follow the link to learn more about Reducer in Hadoop
- September 20, 2018 at 4:17 pm #5815
  
  DataFlair Team
  Spectator
  
  The number of reducer can be set to 0 in driver class by job.setNumreduceTasks(0).This shows that there is no reducer phase and has only map phase.It is called as a map-only job.
  
  Map-only job:
  The map-only job has only map phase.The output of mapper stores directly on HDFS not on disk. The map output is final output.As it has no reducer phase, the aggregation and sorting is also not done.Generally, in map-reducer job the output after shuffling and sorting goes to the reducer, when the data is huge it needs good network bandwidth. As there is no shuffling and sorting in map-only job there will be less network congestion.
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

What happen if number of reducer is 0 in Hadoop?

About DataFlair

Trending Courses in Indore

Trending Courses in Bangalore

Trending Courses in Chennai

Trending Courses in Pune

Trending Courses in Hyderabad

Trending Courses in Delhi NCR