What is the difference between Hdfs block and input split

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop What is the difference between Hdfs block and input split

Viewing 2 reply threads
  • Author
    Posts
    • #5564
      DataFlair TeamDataFlair Team
      Spectator

      What is the difference between Hdfs block and input split. What is given as input to Mapper Block or Split.. If Split then why?

    • #5565
      DataFlair TeamDataFlair Team
      Spectator

      HDFS Blockis the physical part of the disk which has the minimum amount of data that can be read/write.
      While MapReduce InputSplit is the logical chunk of data created by theInputFormat specified in the MapReduce job configuration.
      Logical partition means it will have just the information about blocks address or location.
      In the case where last record (value) in the block is incomplete, the input split includes location information for the next block and byte offset of the data needed to complete the record

      Size:
      Block- The default size of the HDFS block is 128 MB which is configured as per our requirement. All blocks of the file are of the same size except the last block. The last Block can be of same size or smaller. In Hadoop, the files split into 128 MB blocks and then stored into Hadoop Filesystem.
      InputSplit- Split size is approximately equal to block size, by default. Entire block of data may not fit into a single input split.

      Follow the link to learn more about Difference between HDFS Block and MapReduce InputSplit in Hadoop

    • #5566
      DataFlair TeamDataFlair Team
      Spectator

      InputSplit is a logical reference to data means it doesn’t contain any data inside. It is only used during data processing by MapReduce and HDFS block is a physical location where actual data gets stored. And both are configurable by the different methodologies.
      Moreover, all blocks of the file are of the same size except the last block. The last Block can be of same size or smaller. While Split size is approximately equal to block size, by default. An entire block of data may not fit into a single input split.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.