In HDFS architecture there is concept of blocks. A typical size of HDFS block is 128MB. A large file in HDFS is broken down into chunks.
Suppose we have 1GB of the file, and we want to place this file in HDFS then there will be 1GB/128MB= 8 Blocks and here blocks are distributed across different datanodes based on configuration.
Inputsplit is basically used during data processing in MapReduce Program. It is user defined value and can choose the size based on the size of data and how you are processing. If the user does not define the inputsplit then based on the number of blocks, then based on no of blocks input split size is considered.
No of input splits are equal to the number of Mappers in the program to process the data.
If you have 200MB file and HDFS default block size is 128MB. Then it is chopped into 2 blocks(128MB,72MB). If you have not defined Inputsplit size then by it takes size as 2 (as there are 2blocks) and assigns 2 mappers. But if you have specified the split size as 200MB then both blocks will be considered as the single split for map reduce program and assigns one mapper. Or consider if you have provided the split size as 25mb then there will be 4 input spilt for MapReduceprogram and 4 mappers will be assigned.