Live instructor-led & Self-paced Online Certification Training Courses (Big Data, Hadoop, Spark) Forums Hadoop What is the difference between Hdfs block and input split

This topic contains 2 replies, has 1 voice, and was last updated by  dfbdteam3 1 year, 6 months ago.

Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
    Posts
  • #5564

    dfbdteam3
    Moderator

    What is the difference between Hdfs block and input split. What is given as input to Mapper Block or Split.. If Split then why?

    #5565

    dfbdteam3
    Moderator

    HDFS Blockis the physical part of the disk which has the minimum amount of data that can be read/write.
    While MapReduce InputSplit is the logical chunk of data created by theInputFormat specified in the MapReduce job configuration.
    Logical partition means it will have just the information about blocks address or location.
    In the case where last record (value) in the block is incomplete, the input split includes location information for the next block and byte offset of the data needed to complete the record

    Size:
    Block- The default size of the HDFS block is 128 MB which is configured as per our requirement. All blocks of the file are of the same size except the last block. The last Block can be of same size or smaller. In Hadoop, the files split into 128 MB blocks and then stored into Hadoop Filesystem.
    InputSplit- Split size is approximately equal to block size, by default. Entire block of data may not fit into a single input split.

    Follow the link to learn more about Difference between HDFS Block and MapReduce InputSplit in Hadoop

    #5566

    dfbdteam3
    Moderator

    InputSplit is a logical reference to data means it doesn’t contain any data inside. It is only used during data processing by MapReduce and HDFS block is a physical location where actual data gets stored. And both are configurable by the different methodologies.
    Moreover, all blocks of the file are of the same size except the last block. The last Block can be of same size or smaller. While Split size is approximately equal to block size, by default. An entire block of data may not fit into a single input split.

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic.