Why block size is set to 128 MB in HDFS?

Viewing 2 reply threads
  • Author
    Posts
    • #5017
      DataFlair TeamDataFlair Team
      Spectator

      Why 128MB chosen as default block size in Hadoop?
      What is data block size in HDFS?
      Why is HDFS Block size 128 MB in Hadoop?

    • #5020
      DataFlair TeamDataFlair Team
      Spectator

      In Hadoop, input data files are divided into blocks of a prticular size(128 mb by default) and then these blocks of data are stored on different data nodes.

      Hadoop is designed to process large volumes of data. Each block’s information(its address ie on which data node it is stored) is placed in namenode. So if the block size is too small, then there will be a large no. of blocks to be stored on data nodes as well as a large amount of metadata information needs to be stored on namenode, Also each block of data is processed by a Mapper task. If there are large no. of small blocks, a lot of mapper tasks will be required. So having small block size is not very efficient.

      Also the block size should not be very large such that , parallelism cant be achieved. It should not be such that the system is waiting a very long time for one unit of data processing to finish its work.

      A balance needs to be maintained. That’s why the default block size is 128 MB. It can be changed as well depending on the size of input files.

      Follow the link for more detail: HDFS Blocks

    • #5021
      DataFlair TeamDataFlair Team
      Spectator

      Block size means smallest unit of data in file system. In HDFS, block size can be configurable as per requirements, but default is 128 MB.

      Traditional file systems like of Linux have default block size of 4 KB. However, Hadoop is designed and developed to process small number of very large files (Terabytes or Petabytes).

      To process TBs or PBs of data which is divided into blocks of size like 4 KB, following are the drawbacks:
      1. Huge number of blocks to be processed
      2. Huge size of meta-data to be stored at Name node
      3. Huge amount of time will be needed to process meta-data
      4. Overall reducing processing time

      On the other hand, if larger block size is configured then the disk read time will increase, hence the overall processing time.

      To resolve above two factors, block size is set to 128 MB in HDFS. With this block size user can achieve optimum performance.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.