Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › What is the difference between Hdfs block and input split
- This topic has 2 replies, 1 voice, and was last updated 5 years, 6 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 3:39 pm #5564DataFlair TeamSpectator
What is the difference between Hdfs block and input split. What is given as input to Mapper Block or Split.. If Split then why?
-
September 20, 2018 at 3:39 pm #5565DataFlair TeamSpectator
HDFS Blockis the physical part of the disk which has the minimum amount of data that can be read/write.
While MapReduce InputSplit is the logical chunk of data created by theInputFormat specified in the MapReduce job configuration.
Logical partition means it will have just the information about blocks address or location.
In the case where last record (value) in the block is incomplete, the input split includes location information for the next block and byte offset of the data needed to complete the recordSize:
Block- The default size of the HDFS block is 128 MB which is configured as per our requirement. All blocks of the file are of the same size except the last block. The last Block can be of same size or smaller. In Hadoop, the files split into 128 MB blocks and then stored into Hadoop Filesystem.
InputSplit- Split size is approximately equal to block size, by default. Entire block of data may not fit into a single input split.Follow the link to learn more about Difference between HDFS Block and MapReduce InputSplit in Hadoop
-
September 20, 2018 at 3:39 pm #5566DataFlair TeamSpectator
InputSplit is a logical reference to data means it doesn’t contain any data inside. It is only used during data processing by MapReduce and HDFS block is a physical location where actual data gets stored. And both are configurable by the different methodologies.
Moreover, all blocks of the file are of the same size except the last block. The last Block can be of same size or smaller. While Split size is approximately equal to block size, by default. An entire block of data may not fit into a single input split.
-
-
AuthorPosts
- You must be logged in to reply to this topic.