Where is the output of Mapper written ?

This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 3:12 pm #5449
  
  DataFlair Team
  Spectator
  
  In Hadoop MapReduce job where is the output of Mapper written and why?
  Whether it is written on Local disk or HDFS or Both?
- September 20, 2018 at 3:13 pm #5451
  
  DataFlair Team
  Spectator
  
  The output of mappers are written on local disk rather than the HDFS Blocks. Because of the following reasons :
  
  There are 2 levels of processing (Map and Reduce) involved to get the final desired outcome.
  The result generated by mappers are just intermediate/temporary result which is intern result to the Reducers so writing this would be costly process and inefficient.
  
  The final result (outcome of reducers) is stored on HDFS block.
  
  Follow the link to learn more about Mappers in Hadoop
- September 20, 2018 at 3:13 pm #5453
  
  DataFlair Team
  Spectator
  
  The output from the Mappers is spilled to the local disk.
  
  Input Output is the most expensive operation in any MapReduce program and anything that can reduce the data flow over the network will give a better throughput.
  
  As mapper gives a temporary/intermediate output that is only meaningful for the reducer not for the end user, so storing this temporary data back in HDFS will be costly and inefficient.
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.