Ideal hdfs block size to get optimum performance in MapReduce?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 5:45 pm #6327
  
  DataFlair Team
  Spectator
  
  What is the Ideal HDFS block size to get the best performance in MapReduce?
- September 20, 2018 at 5:45 pm #6329
  
  DataFlair Team
  Spectator
  
  Ideal HDFS Blocks size is the one which is not too large (say 1 GB ro so) or too small (say 10 -20 KB) and the input data is actually the factor. As we have HDFS, to deal with Big Data (which is tera or peta bytes), So if we keep block-size small, the number of blocks will be more and managing huge number of block and their meta data will create large overhead and congestion which is certainly not desirable.
  
  On the other hand, by keeping a larger block size, we may completely loose the benefit of the distributed file system where the processing is done in parallel for all the blocks because with less number and large size of blocks will make the process slow and system may have to wait for a very long time for one Mapper to complete its job of data processing.
  
  For example, let’s say we need to process 1 petabyte data. In this case 64 MB block-size may not be ideal size as 15 million(approx.) blocks will be created, which is difficult to manage, so ideal size in this case may be 128 MB or even 256 MB.
  
  Follow the link to learn more about: Data Blocks in Hadoop
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.