Explain map-only job?

Viewing 2 reply threads
  • Author
    Posts
    • #5769
      DataFlair TeamDataFlair Team
      Spectator

      How to write ‘map only’ job in Hadoop?
      When do we need map only job in Hadoop?

    • #5770
      DataFlair TeamDataFlair Team
      Spectator

      Map-Only job is used when there is no Reducer to execute.

      Map does all its task with its InputSplit and no job for Reducer.

      This can be achieved by setting
      job.setNumReduceTasks(0)
      .
      This sets Reducer task to 0 and turns off the Reducer.

      So the no. of output files will be equal to no. of mappers and output files will be named as part-m-00000.

      The advantage of map only jobs is no sorting and shuffling process which was an expensive phase in MapReduce.

      So, once Reducer task is set to Zero the result will be unsorted.

      If we are not specifying this property in Configuration, an Identity Reducer will get executed in which default one reducer is allocated from the framework and the output file will be part-r-00000.

      When there is no aggregation required, Map-Only job is used in Hadoop.

      In map only job, this output is directly written to HDFS.

      Follow the link to learn more about Map-Only job in Hadoop

    • #5771
      DataFlair TeamDataFlair Team
      Spectator

      Map-Only job are normally written when we are sure that no reducers will be required to do aggregation/summation of data.
      So Map-only jobs can be used when we need to parse data for example from Weblogs data to convert it in a structured manner, so to perform this task we won’t require Reducer, this can be set set in the Driver class as
      job.setNumberReduceTaks(0).

Viewing 2 reply threads
  • You must be logged in to reply to this topic.