Why block size is set to 128 MB in HDFS?

This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 1:55 pm #5017
  
  DataFlair Team
  Spectator
  
  Why 128MB chosen as default block size in Hadoop?
  What is data block size in HDFS?
  Why is HDFS Block size 128 MB in Hadoop?
- September 20, 2018 at 1:56 pm #5020
  
  DataFlair Team
  Spectator
  
  In Hadoop, input data files are divided into blocks of a prticular size(128 mb by default) and then these blocks of data are stored on different data nodes.
  
  Hadoop is designed to process large volumes of data. Each block’s information(its address ie on which data node it is stored) is placed in namenode. So if the block size is too small, then there will be a large no. of blocks to be stored on data nodes as well as a large amount of metadata information needs to be stored on namenode, Also each block of data is processed by a Mapper task. If there are large no. of small blocks, a lot of mapper tasks will be required. So having small block size is not very efficient.
  
  Also the block size should not be very large such that , parallelism cant be achieved. It should not be such that the system is waiting a very long time for one unit of data processing to finish its work.
  
  A balance needs to be maintained. That’s why the default block size is 128 MB. It can be changed as well depending on the size of input files.
  
  Follow the link for more detail: HDFS Blocks
- September 20, 2018 at 1:56 pm #5021
  
  DataFlair Team
  Spectator
  
  Block size means smallest unit of data in file system. In HDFS, block size can be configurable as per requirements, but default is 128 MB.
  
  Traditional file systems like of Linux have default block size of 4 KB. However, Hadoop is designed and developed to process small number of very large files (Terabytes or Petabytes).
  
  To process TBs or PBs of data which is divided into blocks of size like 4 KB, following are the drawbacks:
  1. Huge number of blocks to be processed
  2. Huge size of meta-data to be stored at Name node
  3. Huge amount of time will be needed to process meta-data
  4. Overall reducing processing time
  
  On the other hand, if larger block size is configured then the disk read time will increase, hence the overall processing time.
  
  To resolve above two factors, block size is set to 128 MB in HDFS. With this block size user can achieve optimum performance.
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

Why block size is set to 128 MB in HDFS?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses