Where is the output of Mapper written ?

Viewing 2 reply threads
  • Author
    Posts
    • #5449
      DataFlair TeamDataFlair Team
      Spectator

      In Hadoop MapReduce job where is the output of Mapper written and why?
      Whether it is written on Local disk or HDFS or Both?

    • #5451
      DataFlair TeamDataFlair Team
      Spectator

      The output of mappers are written on local disk rather than the HDFS Blocks. Because of the following reasons :

      There are 2 levels of processing (Map and Reduce) involved to get the final desired outcome.
      The result generated by mappers are just intermediate/temporary result which is intern result to the Reducers so writing this would be costly process and inefficient.

      The final result (outcome of reducers) is stored on HDFS block.

      Follow the link to learn more about Mappers in Hadoop

    • #5453
      DataFlair TeamDataFlair Team
      Spectator

      The output from the Mappers is spilled to the local disk.

      Input Output is the most expensive operation in any MapReduce program and anything that can reduce the data flow over the network will give a better throughput.

      As mapper gives a temporary/intermediate output that is only meaningful for the reducer not for the end user, so storing this temporary data back in HDFS will be costly and inefficient.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.