Why HDFS block size is large ?

Viewing 2 reply threads
  • Author
    Posts
    • #5133
      DataFlair TeamDataFlair Team
      Spectator

      FileSystem block size is in KBs while disk block size is in bytes but why HDFS block size is large ?

    • #5135
      DataFlair TeamDataFlair Team
      Spectator

      HDFS blocks are large compared to disk blocks, because to minimize the cost of seeks. If we have many smaller size disk blocks, the seek time would be maximum (time spent to seek/look for an information). And also, having multiple small sized blocks is the burden on name node/master, as ultimately the name node stores metadata, so it has to save this disk block information.

      If the Data Block is large enough, the time it takes to transfer the data from the disk can be significantly longer than the time to seek to the start of the block. Thus, transferring a large file made of multiple blocks operates at the disk transfer rate.

      For each block we need a Mapper. So, in the case of small-sized blocks, there will be a lot of Mappers. Each will be processing the data, which isn’t efficient.

      Follow the link to learn more about HDFS Data Blocks

    • #5136
      DataFlair TeamDataFlair Team
      Spectator

      HDFS blocks are large, the reason is to lower the seek time(the time to locate the head of a file to read it completely).

      With smaller Data Block we have larger no of seek time and lesser number of transfer time, however, we wanted to reverse this process, i.e lesser no of seek time and more no.of transfer time( seek time/transfer time = .01), which is only possible with larger block sizes. Many times we won’t be interested to read the complete file and just find the seek, to process the files.

      With HDFS we deal with large files and quick processing hence helps in having lesser seek time.
      Also due to this large block size in HDFS (64mb), MapReduce to processes large single block file easily at one time.

      Follow the link to learn more about HDFS Blocks in Hadoop

Viewing 2 reply threads
  • You must be logged in to reply to this topic.