What is the fundamental difference between a MapReduce InputSplit and HDFS block

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 2:13 pm #5091
  
  DataFlair Team
  Spectator
  
  What is different between the split and block in Hadoop?
  Comparison between MapReduce InputSplit vs HDFS Block?
  Comparison between Split Size vs Block Size in Hadoop?
- September 20, 2018 at 2:14 pm #5092
  
  DataFlair Team
  Spectator
  
  InputSplit is the logical division of data.
  Data Block is the physical division of data.
  
  In HDFS architecture there is concept of blocks. A typical size of HDFS block is 128MB. A large file in HDFS is broken down into chunks.
  Suppose we have 1GB of the file, and we want to place this file in HDFS then there will be 1GB/128MB= 8 Blocks and here blocks are distributed across different datanodes based on configuration.
  
  Inputsplit is basically used during data processing in MapReduce Program. It is user defined value and can choose the size based on the size of data and how you are processing. If the user does not define the inputsplit then based on the number of blocks, then based on no of blocks input split size is considered.
  No of input splits are equal to the number of Mappers in the program to process the data.
  
  For example,
  If you have 200MB file and HDFS default block size is 128MB. Then it is chopped into 2 blocks(128MB,72MB). If you have not defined Inputsplit size then by it takes size as 2 (as there are 2blocks) and assigns 2 mappers. But if you have specified the split size as 200MB then both blocks will be considered as the single split for map reduce program and assigns one mapper. Or consider if you have provided the split size as 25mb then there will be 4 input spilt for MapReduceprogram and 4 mappers will be assigned.
  
  Follow the link to learn more about Difference between MapReduce Inputsplit and HDFS Blocks
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

What is the fundamental difference between a MapReduce InputSplit and HDFS block

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses