How to handle record boundaries in Text/ Sequence files in MapReduce Inputsplits

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop How to handle record boundaries in Text/ Sequence files in MapReduce Inputsplits

Viewing 1 reply thread
  • Author
    Posts
    • #6177
      DataFlair TeamDataFlair Team
      Spectator

      How to handle record boundaries in Text files or Sequence files in MapReduce InputSplits?
      How we handle record bounderies in Text file in Mapreduce Inputsplits in Hadoop?

    • #6180
      DataFlair TeamDataFlair Team
      Spectator

      Hadoop will use your RecordReader and InputFormat to figure out how to 1. create splits and 2. parse data within each split into records (or K/V objects) that can be passed to the mapper. If an InputSplit (which you get to create in your input format) doesn’t map exactly to an HDFS block, Hadoop’s FileInputFormat (and people that extend it) will Do The Right Thing(tm) by performing a partial network read to complete the record using the first few bytes from the next block. The source to TextInputFormat (which in turn extends FileInputFormat) is where all the logic lives.

Viewing 1 reply thread
  • You must be logged in to reply to this topic.