This topic contains 3 replies, has 1 voice, and was last updated by  dfbdteam3 1 year, 6 months ago.

Viewing 4 posts - 1 through 4 (of 4 total)
  • Author
    Posts
  • #5627

    dfbdteam3
    Moderator

    Can we specify reducer to 0?
    How to set no of reduce task to zero?
    What can be the minimum number of reducers in map reduce?

    #5629

    dfbdteam3
    Moderator

    Yes. We can set the number of Reducer to 0 in Hadoop and it is valid configuration.
    When we set the reducer to 0 in that case, no reduce phase gets executed and output from mapper is considered as final output and written in HDFS
    Following are the ways to set the reducer to 0
    By setting the mapred.reduce.tasks = 0

    job.setNumReduceTasks(0);

    where job is an instance of class JobConf which helps the user to configure the map/reduce job.

    Job in which we set the No. of Reducer = 0, it is also known as Map only job.
    In a map-only job, the map does all task with its InputSplit and the reducer does no job. Between map and reduce phases there is key, sort, and shuffle phase. Sort and shuffle phase are responsible for sorting the keys in ascending order. Then grouping values based on same keys. This phase is very expensive. If reduce phase is not required we should avoid it. Avoiding reduce phase would eliminate sort and shuffle phase as well. This also saves network congestion. As in shuffling an output of mapper travels to the reducer, when data size is huge, large data travel to the reducer.

    Follow the link to learn more about Reducer in Hadoop

     

    #5632

    dfbdteam3
    Moderator

    Number of Reducer can be set to zero if there is no need of a reducer job. As reducer is generally used for data consolidation or aggregation rather than heavy computation.

    If there is no reducer defined, in that case, the output generated by the mapper task will be considered as final output and stored in HDFS.

    #5633

    dfbdteam3
    Moderator

    Yes, we can set the Number of Reducer to zero.This means it is map only.The data is not sorted and directly stored in HDFS.
    job.setNumReduceTasks(0)

    If we want the output from mapper to be sorted ,we can use Identity reducer.

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic.