Explain map-only job in Hadoop?

Viewing 2 reply threads
  • Author
    Posts
    • #6026
      DataFlair TeamDataFlair Team
      Spectator

      What is the Map-Only job in Hadoop?

    • #6028
      DataFlair TeamDataFlair Team
      Spectator

      MapReduce is the processing framework of Hadoop, The processing takes place in two phase/ task

      a) MAP task- A set of data is taken and broken down into chunks(tuples key-value pair)
      b) REDUCE task- the output of Map task is the input for reduce phase, thus the tuples are modified based on the value of Key, i.e aggregation of data based on keys.

      Map-Only job is used if there no need for aggregation, thus in the Map-only job, the map does all task with its InputSplit and no job is done by the reducer. Hence map output is the final output.

      Reduce phase can be avoided by setting job.setNumreduceTasks(0) in the configuration
      Sort and Shuffle takes place in Reduce phase, which is expensive, in Map-only job this can be avoided. Since the outpu of Map phase is final output it can be directly written on HDFS instead of Disk.

      Follow the link for more detail Map-Only job

    • #6030
      DataFlair TeamDataFlair Team
      Spectator

      The name itself tells us what it acuatly means,ie; generally in MapReduce output of mapper will goes as input for reducer ,in some cases there is no need of reducer job (if no aggregation is needed) so in that cases mapper output is consider as acual output .these are considered as Map-Only job in hadoop.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.