Why block size is large (128 MB) in hadoop hdfs?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop Why block size is large (128 MB) in hadoop hdfs?

Viewing 2 reply threads
  • Author
    Posts
    • #5478
      DataFlair TeamDataFlair Team
      Spectator

      Why hadoop uses 128 MB sized block (by default), the default block size of OS file system is 4 KB only, what are the pros and cons ?

    • #5485
      DataFlair TeamDataFlair Team
      Spectator

      Hadoop is primarily designed for storing large files.
      If we need to store 128MB of data and each Block is 4 KB total 32000 blocks will be created. Each block address is stored as a separate object in namenode. Hence this will lead to lot of memory usage in name node thus creating traffic which we don’t want.
      Also in this scenario, each map task process very little input and there will be lot more map tasks and the job time will be a more.

    • #5486
      DataFlair TeamDataFlair Team
      Spectator

      While creating the blocks in the Hadoop they follow rules like “less number of large files are better than a large number of small files”. HDFS contains huge data sets, i.e. petabytes of data. So like Linux file system, if HDFS had a block size of 4KB, then it would be having too many data blocks in Hadoop HDFS. But, if we create multiple small files it will increase the metadata size. This will lead to decrease the performance.
      On the other hand, block size can’t be so large that the system is waiting a very long time for one last unit of data processing to finish its work.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.