Data node block size in HDFS, why 64MB?

This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 4:08 pm #5758
  
  DataFlair Team
  Spectator
  
  The default data block size of HDFS/Hadoop is 64MB. The block size in disk is generally 4KB. Why block size is large in HDFS/Hadoop
- September 20, 2018 at 4:08 pm #5759
  
  DataFlair Team
  Spectator
  
  Block is the smallest unit of data that the file system stores. HDFS stores each file as blocks. In Hadoop 1.x default Block size was 64 MB which can be configured as per the requirement. The files were splitted into 64MB blocks and then stored into the Hadoop filesystem. The hadoop applications were responsible for distributing the data block across the multiple nodes.
  All the blocks of the file are of the same size except the last block, which can be either the same or smaller.
  
  Why blocks size 64 MB?
  HDFS have huge data sets, i.e. terabytes and petabytes of data. So like Linux file system which have 4 KB block size if we had block size 4KB for HDFS, then we would be having too many blocks and therefore too much of metadata. Managing this huge number of blocks and metadata will create huge overhead. This is something which we don’t want.
  
  Hadoop 2.x has default block size 128MB.
  
  Reasons for the increase in Block size from 64MB to 128MB are as follows:
  
  To improve the NameNode performance.
  To improve the performance of MapReduce job since the number of the mapper is directly dependent on Block size.
  If we are managing a cluster of 1 petabytes and block size is 64 MB, then 15+million blocks will create which is difficult for NameNode to manage. So block size is increased from 64MB to 128MB.
  Follow the link to learn more about Data Block in Hadoop
- September 20, 2018 at 4:08 pm #5761
  
  DataFlair Team
  Spectator
  
  The Block in HDFS can be configured, But default size is 64 MB and 128 MB in Hadoop version 2.
  If the block size was 4 KB like Unix system, then this would lead to more number of blocks and too many mappers to process this which would degrade performance.
  Importantly, Hadoop is for handling Big Data, hence block size should be large and it also reduces load on namenode as it contains the metadata,the more blocks, the more overload on namenode too.
  
  Follow the link to learn more about Data Block in Hadoop
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

Data node block size in HDFS, why 64MB?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses