Why block size is large (128 MB) in hadoop hdfs?

This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 3:18 pm #5478
  
  DataFlair Team
  Spectator
  
  Why hadoop uses 128 MB sized block (by default), the default block size of OS file system is 4 KB only, what are the pros and cons ?
- September 20, 2018 at 3:21 pm #5485
  
  DataFlair Team
  Spectator
  
  Hadoop is primarily designed for storing large files.
  If we need to store 128MB of data and each Block is 4 KB total 32000 blocks will be created. Each block address is stored as a separate object in namenode. Hence this will lead to lot of memory usage in name node thus creating traffic which we don’t want.
  Also in this scenario, each map task process very little input and there will be lot more map tasks and the job time will be a more.
- September 20, 2018 at 3:21 pm #5486
  
  DataFlair Team
  Spectator
  
  While creating the blocks in the Hadoop they follow rules like “less number of large files are better than a large number of small files”. HDFS contains huge data sets, i.e. petabytes of data. So like Linux file system, if HDFS had a block size of 4KB, then it would be having too many data blocks in Hadoop HDFS. But, if we create multiple small files it will increase the metadata size. This will lead to decrease the performance.
  On the other hand, block size can’t be so large that the system is waiting a very long time for one last unit of data processing to finish its work.
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.