How to handle record boundaries in Text/ Sequence files in MapReduce Inputsplits
-
-
How to handle record boundaries in Text files or Sequence files in MapReduce InputSplits?
How we handle record bounderies in Text file in Mapreduce Inputsplits in Hadoop?
-
Hadoop will use your RecordReader and InputFormat to figure out how to 1. create splits and 2. parse data within each split into records (or K/V objects) that can be passed to the mapper. If an InputSplit (which you get to create in your input format) doesn’t map exactly to an HDFS block, Hadoop’s FileInputFormat (and people that extend it) will Do The Right Thing(tm) by performing a partial network read to complete the record using the first few bytes from the next block. The source to TextInputFormat (which in turn extends FileInputFormat) is where all the logic lives.
- You must be logged in to reply to this topic.