Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › Data node block size in HDFS, why 64MB?
- This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 4:08 pm #5758DataFlair TeamSpectator
The default data block size of HDFS/Hadoop is 64MB. The block size in disk is generally 4KB. Why block size is large in HDFS/Hadoop
-
September 20, 2018 at 4:08 pm #5759DataFlair TeamSpectator
Block is the smallest unit of data that the file system stores. HDFS stores each file as blocks. In Hadoop 1.x default Block size was 64 MB which can be configured as per the requirement. The files were splitted into 64MB blocks and then stored into the Hadoop filesystem. The hadoop applications were responsible for distributing the data block across the multiple nodes.
All the blocks of the file are of the same size except the last block, which can be either the same or smaller.Why blocks size 64 MB?
HDFS have huge data sets, i.e. terabytes and petabytes of data. So like Linux file system which have 4 KB block size if we had block size 4KB for HDFS, then we would be having too many blocks and therefore too much of metadata. Managing this huge number of blocks and metadata will create huge overhead. This is something which we don’t want.Hadoop 2.x has default block size 128MB.
Reasons for the increase in Block size from 64MB to 128MB are as follows:
To improve the NameNode performance.
To improve the performance of MapReduce job since the number of the mapper is directly dependent on Block size.
If we are managing a cluster of 1 petabytes and block size is 64 MB, then 15+million blocks will create which is difficult for NameNode to manage. So block size is increased from 64MB to 128MB.
Follow the link to learn more about Data Block in Hadoop -
September 20, 2018 at 4:08 pm #5761DataFlair TeamSpectator
The Block in HDFS can be configured, But default size is 64 MB and 128 MB in Hadoop version 2.
If the block size was 4 KB like Unix system, then this would lead to more number of blocks and too many mappers to process this which would degrade performance.
Importantly, Hadoop is for handling Big Data, hence block size should be large and it also reduces load on namenode as it contains the metadata,the more blocks, the more overload on namenode too.Follow the link to learn more about Data Block in Hadoop
-
-
AuthorPosts
- You must be logged in to reply to this topic.