can we set the number of reducers to zero in MapReduce?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop can we set the number of reducers to zero in MapReduce?

Viewing 3 reply threads
  • Author
    Posts
    • #5627
      DataFlair TeamDataFlair Team
      Spectator

      Can we specify reducer to 0?
      How to set no of reduce task to zero?
      What can be the minimum number of reducers in map reduce?

    • #5629
      DataFlair TeamDataFlair Team
      Spectator

      Yes. We can set the number of Reducer to 0 in Hadoop and it is valid configuration.
      When we set the reducer to 0 in that case, no reduce phase gets executed and output from mapper is considered as final output and written in HDFS
      Following are the ways to set the reducer to 0
      By setting the mapred.reduce.tasks = 0

      job.setNumReduceTasks(0);

      where job is an instance of class JobConf which helps the user to configure the map/reduce job.

      Job in which we set the No. of Reducer = 0, it is also known as Map only job.
      In a map-only job, the map does all task with its InputSplit and the reducer does no job. Between map and reduce phases there is key, sort, and shuffle phase. Sort and shuffle phase are responsible for sorting the keys in ascending order. Then grouping values based on same keys. This phase is very expensive. If reduce phase is not required we should avoid it. Avoiding reduce phase would eliminate sort and shuffle phase as well. This also saves network congestion. As in shuffling an output of mapper travels to the reducer, when data size is huge, large data travel to the reducer.

      Follow the link to learn more about Reducer in Hadoop

       

    • #5632
      DataFlair TeamDataFlair Team
      Spectator

      Number of Reducer can be set to zero if there is no need of a reducer job. As reducer is generally used for data consolidation or aggregation rather than heavy computation.

      If there is no reducer defined, in that case, the output generated by the mapper task will be considered as final output and stored in HDFS.

    • #5633
      DataFlair TeamDataFlair Team
      Spectator

      Yes, we can set the Number of Reducer to zero.This means it is map only.The data is not sorted and directly stored in HDFS.
      job.setNumReduceTasks(0)

      If we want the output from mapper to be sorted ,we can use Identity reducer.

Viewing 3 reply threads
  • You must be logged in to reply to this topic.