Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › Explain map-only job in Hadoop?
- This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 4:58 pm #6026DataFlair TeamSpectator
What is the Map-Only job in Hadoop?
-
September 20, 2018 at 4:59 pm #6028DataFlair TeamSpectator
MapReduce is the processing framework of Hadoop, The processing takes place in two phase/ task
a) MAP task- A set of data is taken and broken down into chunks(tuples key-value pair)
b) REDUCE task- the output of Map task is the input for reduce phase, thus the tuples are modified based on the value of Key, i.e aggregation of data based on keys.Map-Only job is used if there no need for aggregation, thus in the Map-only job, the map does all task with its InputSplit and no job is done by the reducer. Hence map output is the final output.
Reduce phase can be avoided by setting job.setNumreduceTasks(0) in the configuration
Sort and Shuffle takes place in Reduce phase, which is expensive, in Map-only job this can be avoided. Since the outpu of Map phase is final output it can be directly written on HDFS instead of Disk.Follow the link for more detail Map-Only job
-
September 20, 2018 at 4:59 pm #6030DataFlair TeamSpectator
The name itself tells us what it acuatly means,ie; generally in MapReduce output of mapper will goes as input for reducer ,in some cases there is no need of reducer job (if no aggregation is needed) so in that cases mapper output is consider as acual output .these are considered as Map-Only job in hadoop.
-
-
AuthorPosts
- You must be logged in to reply to this topic.