Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › Why block size is set to 128 MB in HDFS?
- This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 1:55 pm #5017DataFlair TeamSpectator
Why 128MB chosen as default block size in Hadoop?
What is data block size in HDFS?
Why is HDFS Block size 128 MB in Hadoop? -
September 20, 2018 at 1:56 pm #5020DataFlair TeamSpectator
In Hadoop, input data files are divided into blocks of a prticular size(128 mb by default) and then these blocks of data are stored on different data nodes.
Hadoop is designed to process large volumes of data. Each block’s information(its address ie on which data node it is stored) is placed in namenode. So if the block size is too small, then there will be a large no. of blocks to be stored on data nodes as well as a large amount of metadata information needs to be stored on namenode, Also each block of data is processed by a Mapper task. If there are large no. of small blocks, a lot of mapper tasks will be required. So having small block size is not very efficient.
Also the block size should not be very large such that , parallelism cant be achieved. It should not be such that the system is waiting a very long time for one unit of data processing to finish its work.
A balance needs to be maintained. That’s why the default block size is 128 MB. It can be changed as well depending on the size of input files.
Follow the link for more detail: HDFS Blocks
-
September 20, 2018 at 1:56 pm #5021DataFlair TeamSpectator
Block size means smallest unit of data in file system. In HDFS, block size can be configurable as per requirements, but default is 128 MB.
Traditional file systems like of Linux have default block size of 4 KB. However, Hadoop is designed and developed to process small number of very large files (Terabytes or Petabytes).
To process TBs or PBs of data which is divided into blocks of size like 4 KB, following are the drawbacks:
1. Huge number of blocks to be processed
2. Huge size of meta-data to be stored at Name node
3. Huge amount of time will be needed to process meta-data
4. Overall reducing processing timeOn the other hand, if larger block size is configured then the disk read time will increase, hence the overall processing time.
To resolve above two factors, block size is set to 128 MB in HDFS. With this block size user can achieve optimum performance.
-
-
AuthorPosts
- You must be logged in to reply to this topic.