Explain map-only job?

This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 4:10 pm #5769
  
  DataFlair Team
  Spectator
  
  How to write ‘map only’ job in Hadoop?
  When do we need map only job in Hadoop?
- September 20, 2018 at 4:10 pm #5770
  
  DataFlair Team
  Spectator
  
  Map-Only job is used when there is no Reducer to execute.
  
  Map does all its task with its InputSplit and no job for Reducer.
  
  This can be achieved by setting
  job.setNumReduceTasks(0)
  .
  This sets Reducer task to 0 and turns off the Reducer.
  
  So the no. of output files will be equal to no. of mappers and output files will be named as part-m-00000.
  
  The advantage of map only jobs is no sorting and shuffling process which was an expensive phase in MapReduce.
  
  So, once Reducer task is set to Zero the result will be unsorted.
  
  If we are not specifying this property in Configuration, an Identity Reducer will get executed in which default one reducer is allocated from the framework and the output file will be part-r-00000.
  
  When there is no aggregation required, Map-Only job is used in Hadoop.
  
  In map only job, this output is directly written to HDFS.
  
  Follow the link to learn more about Map-Only job in Hadoop
- September 20, 2018 at 4:10 pm #5771
  
  DataFlair Team
  Spectator
  
  Map-Only job are normally written when we are sure that no reducers will be required to do aggregation/summation of data.
  So Map-only jobs can be used when we need to parse data for example from Weblogs data to convert it in a structured manner, so to perform this task we won’t require Reducer, this can be set set in the Driver class as
  job.setNumberReduceTaks(0).
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

Explain map-only job?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses