This topic contains 2 replies, has 1 voice, and was last updated by  dfbdteam3 1 year, 6 months ago.

Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
    Posts
  • #5449

    dfbdteam3
    Moderator

    In Hadoop MapReduce job where is the output of Mapper written and why?
    Whether it is written on Local disk or HDFS or Both?

    #5451

    dfbdteam3
    Moderator

    The output of mappers are written on local disk rather than the HDFS Blocks. Because of the following reasons :

    There are 2 levels of processing (Map and Reduce) involved to get the final desired outcome.
    The result generated by mappers are just intermediate/temporary result which is intern result to the Reducers so writing this would be costly process and inefficient.

    The final result (outcome of reducers) is stored on HDFS block.

    Follow the link to learn more about Mappers in Hadoop

    #5453

    dfbdteam3
    Moderator

    The output from the Mappers is spilled to the local disk.

    Input Output is the most expensive operation in any MapReduce program and anything that can reduce the data flow over the network will give a better throughput.

    As mapper gives a temporary/intermediate output that is only meaningful for the reducer not for the end user, so storing this temporary data back in HDFS will be costly and inefficient.

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic.