Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › can we set the number of reducers to zero in MapReduce?
- This topic has 3 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 3:48 pm #5627DataFlair TeamSpectator
Can we specify reducer to 0?
How to set no of reduce task to zero?
What can be the minimum number of reducers in map reduce? -
September 20, 2018 at 3:49 pm #5629DataFlair TeamSpectator
Yes. We can set the number of Reducer to 0 in Hadoop and it is valid configuration.
When we set the reducer to 0 in that case, no reduce phase gets executed and output from mapper is considered as final output and written in HDFS
Following are the ways to set the reducer to 0
By setting the mapred.reduce.tasks = 0job.setNumReduceTasks(0);
where job is an instance of class JobConf which helps the user to configure the map/reduce job.
Job in which we set the No. of Reducer = 0, it is also known as Map only job.
In a map-only job, the map does all task with its InputSplit and the reducer does no job. Between map and reduce phases there is key, sort, and shuffle phase. Sort and shuffle phase are responsible for sorting the keys in ascending order. Then grouping values based on same keys. This phase is very expensive. If reduce phase is not required we should avoid it. Avoiding reduce phase would eliminate sort and shuffle phase as well. This also saves network congestion. As in shuffling an output of mapper travels to the reducer, when data size is huge, large data travel to the reducer.Follow the link to learn more about Reducer in Hadoop
-
September 20, 2018 at 3:49 pm #5632DataFlair TeamSpectator
Number of Reducer can be set to zero if there is no need of a reducer job. As reducer is generally used for data consolidation or aggregation rather than heavy computation.
If there is no reducer defined, in that case, the output generated by the mapper task will be considered as final output and stored in HDFS.
-
September 20, 2018 at 3:49 pm #5633DataFlair TeamSpectator
Yes, we can set the Number of Reducer to zero.This means it is map only.The data is not sorted and directly stored in HDFS.
job.setNumReduceTasks(0)If we want the output from mapper to be sorted ,we can use Identity reducer.
-
-
AuthorPosts
- You must be logged in to reply to this topic.