What is the need for Map only job in Hadoop

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop What is the need for Map only job in Hadoop

Viewing 4 reply threads
  • Author
    Posts
    • #6122
      DataFlair TeamDataFlair Team
      Spectator

      Why we need Map only job?
      Where we can use Map only job instead of MapReduce job ?

    • #6124
      DataFlair TeamDataFlair Team
      Spectator

      When there is no Reduce job, then it is Map-only job . Map does all the tasks and there is no job for Reducer. A number of output files and mappers are same in the map only job.
      We can use Map only job when we only need to perform a number of series process on each block of data.

      Follow the link to learn more about Map-only job in Hadoop

    • #6126
      DataFlair TeamDataFlair Team
      Spectator

      Map-only job can be performed when we want to conduct operation on values and no aggregation is required like filtering data or formatting the data.
      In mapper we cant perform aggregation as to aggregate mappers need to communicate among themselves,which is not possible. As no shorting,shuffling,combiner or partitioner phase is not involved,it is fast.

      In Map only jobs the data is not sorted and is directly written into HDFS .

      The output file name will be part-m-00000,

      For more detail follow Map-only job in Hadoop

    • #6128
      DataFlair TeamDataFlair Team
      Spectator

      A Map-only job would be required in the scenarios where there is no requirement of doing any aggregation or grouping and only some operation on values are required to be performed like data cleanup where each row of data is read by mapper, if it matches some conditional statement then its deleted.
      Here no Reducer job is required.

    • #6130
      DataFlair TeamDataFlair Team
      Spectator

      Map-only job job increase the performance compared to mapreduce
      we will use map only job if there is no need of aggregation in the given problem statement. we can achieve this by setting
      job.setNumreduceTasks(0)

      in the configuration in a driver.

Viewing 4 reply threads
  • You must be logged in to reply to this topic.