Ideal hdfs block size to get optimum performance in MapReduce?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop Ideal hdfs block size to get optimum performance in MapReduce?

Viewing 1 reply thread
  • Author
    Posts
    • #6327
      DataFlair TeamDataFlair Team
      Spectator

      What is the Ideal HDFS block size to get the best performance in MapReduce?

    • #6329
      DataFlair TeamDataFlair Team
      Spectator

      Ideal HDFS Blocks size is the one which is not too large (say 1 GB ro so) or too small (say 10 -20 KB) and the input data is actually the factor. As we have HDFS, to deal with Big Data (which is tera or peta bytes), So if we keep block-size small, the number of blocks will be more and managing huge number of block and their meta data will create large overhead and congestion which is certainly not desirable.

      On the other hand, by keeping a larger block size, we may completely loose the benefit of the distributed file system where the processing is done in parallel for all the blocks because with less number and large size of blocks will make the process slow and system may have to wait for a very long time for one Mapper to complete its job of data processing.

      For example, let’s say we need to process 1 petabyte data. In this case 64 MB block-size may not be ideal size as 15 million(approx.) blocks will be created, which is difficult to manage, so ideal size in this case may be 128 MB or even 256 MB.

      Follow the link to learn more about: Data Blocks in Hadoop

Viewing 1 reply thread
  • You must be logged in to reply to this topic.