Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › Explain map-only job?
- This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 4:10 pm #5769DataFlair TeamSpectator
How to write ‘map only’ job in Hadoop?
When do we need map only job in Hadoop? -
September 20, 2018 at 4:10 pm #5770DataFlair TeamSpectator
Map-Only job is used when there is no Reducer to execute.
Map does all its task with its InputSplit and no job for Reducer.
This can be achieved by setting
job.setNumReduceTasks(0)
.
This sets Reducer task to 0 and turns off the Reducer.So the no. of output files will be equal to no. of mappers and output files will be named as part-m-00000.
The advantage of map only jobs is no sorting and shuffling process which was an expensive phase in MapReduce.
So, once Reducer task is set to Zero the result will be unsorted.
If we are not specifying this property in Configuration, an Identity Reducer will get executed in which default one reducer is allocated from the framework and the output file will be part-r-00000.
When there is no aggregation required, Map-Only job is used in Hadoop.
In map only job, this output is directly written to HDFS.
Follow the link to learn more about Map-Only job in Hadoop
-
September 20, 2018 at 4:10 pm #5771DataFlair TeamSpectator
Map-Only job are normally written when we are sure that no reducers will be required to do aggregation/summation of data.
So Map-only jobs can be used when we need to parse data for example from Weblogs data to convert it in a structured manner, so to perform this task we won’t require Reducer, this can be set set in the Driver class as
job.setNumberReduceTaks(0).
-
-
AuthorPosts
- You must be logged in to reply to this topic.