Live instructor-led & Self-paced Online Certification Training Courses (Big Data, Hadoop, Spark) Forums Hadoop Ideal hdfs block size to get optimum performance in MapReduce?

This topic contains 1 reply, has 1 voice, and was last updated by  dfbdteam3 1 year, 1 month ago.

Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
    Posts
  • #6327

    dfbdteam3
    Moderator

    What is the Ideal HDFS block size to get the best performance in MapReduce?

    #6329

    dfbdteam3
    Moderator

    Ideal HDFS Blocks size is the one which is not too large (say 1 GB ro so) or too small (say 10 -20 KB) and the input data is actually the factor. As we have HDFS, to deal with Big Data (which is tera or peta bytes), So if we keep block-size small, the number of blocks will be more and managing huge number of block and their meta data will create large overhead and congestion which is certainly not desirable.

    On the other hand, by keeping a larger block size, we may completely loose the benefit of the distributed file system where the processing is done in parallel for all the blocks because with less number and large size of blocks will make the process slow and system may have to wait for a very long time for one Mapper to complete its job of data processing.

    For example, let’s say we need to process 1 petabyte data. In this case 64 MB block-size may not be ideal size as 15 million(approx.) blocks will be created, which is difficult to manage, so ideal size in this case may be 128 MB or even 256 MB.

    Follow the link to learn more about: Data Blocks in Hadoop

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.