Explain map-only job in Hadoop?

This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 4:58 pm #6026
  
  DataFlair Team
  Spectator
  
  What is the Map-Only job in Hadoop?
- September 20, 2018 at 4:59 pm #6028
  
  DataFlair Team
  Spectator
  
  MapReduce is the processing framework of Hadoop, The processing takes place in two phase/ task
  
  a) MAP task- A set of data is taken and broken down into chunks(tuples key-value pair)
  b) REDUCE task- the output of Map task is the input for reduce phase, thus the tuples are modified based on the value of Key, i.e aggregation of data based on keys.
  
  Map-Only job is used if there no need for aggregation, thus in the Map-only job, the map does all task with its InputSplit and no job is done by the reducer. Hence map output is the final output.
  
  Reduce phase can be avoided by setting job.setNumreduceTasks(0) in the configuration
  Sort and Shuffle takes place in Reduce phase, which is expensive, in Map-only job this can be avoided. Since the outpu of Map phase is final output it can be directly written on HDFS instead of Disk.
  
  Follow the link for more detail Map-Only job
- September 20, 2018 at 4:59 pm #6030
  
  DataFlair Team
  Spectator
  
  The name itself tells us what it acuatly means,ie; generally in MapReduce output of mapper will goes as input for reducer ,in some cases there is no need of reducer job (if no aggregation is needed) so in that cases mapper output is consider as acual output .these are considered as Map-Only job in hadoop.
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.