What is the fundamental difference between a MapReduce InputSplit and HDFS block

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop What is the fundamental difference between a MapReduce InputSplit and HDFS block

Viewing 1 reply thread
  • Author
    Posts
    • #5091
      DataFlair TeamDataFlair Team
      Spectator

      What is different between the split and block in Hadoop?
      Comparison between MapReduce InputSplit vs HDFS Block?
      Comparison between Split Size vs Block Size in Hadoop?

    • #5092
      DataFlair TeamDataFlair Team
      Spectator

      InputSplit is the logical division of data.
      Data Block is the physical division of data.

      In HDFS architecture there is concept of blocks. A typical size of HDFS block is 128MB. A large file in HDFS is broken down into chunks.
      Suppose we have 1GB of the file, and we want to place this file in HDFS then there will be 1GB/128MB= 8 Blocks and here blocks are distributed across different datanodes based on configuration.

      Inputsplit is basically used during data processing in MapReduce Program. It is user defined value and can choose the size based on the size of data and how you are processing. If the user does not define the inputsplit then based on the number of blocks, then based on no of blocks input split size is considered.
      No of input splits are equal to the number of Mappers in the program to process the data.

      For example,
      If you have 200MB file and HDFS default block size is 128MB. Then it is chopped into 2 blocks(128MB,72MB). If you have not defined Inputsplit size then by it takes size as 2 (as there are 2blocks) and assigns 2 mappers. But if you have specified the split size as 200MB then both blocks will be considered as the single split for map reduce program and assigns one mapper. Or consider if you have provided the split size as 25mb then there will be 4 input spilt for MapReduceprogram and 4 mappers will be assigned.

      Follow the link to learn more about Difference between MapReduce Inputsplit and HDFS Blocks

Viewing 1 reply thread
  • You must be logged in to reply to this topic.