What is a Record Reader in hadoop?

Viewing 3 reply threads
  • Author
    Posts
    • #5964
      DataFlair TeamDataFlair Team
      Spectator

      Explain RecordReader. What is the need of RecordReader ?

    • #5967
      DataFlair TeamDataFlair Team
      Spectator

      RecordReader communicates with the InputSplit (created by InputFormat) and converts the split into records. Records are in form of Key-value pairs that are suitable for reading by the Mapper. RecordReader communicates with the inputsplit until it does not read the complete file

      RecordReader uses the data within the boundaries, defined by InputSplit. At “start” RecordReader in hadoop start generating key-value pairs and the “end” is where it should stop reading records.
      The MapReduce framework defines RecordReader instance by the InputFormat. By, default; it uses TextInputFormat for converting data into key-value pairs.

      TextInputFormat provides 2 types of RecordReader : LineRecordReader and SequenceFileRecordReader.

      LineRecordReader- LineRecordReader in Hadoop is the default RecordReader that TextInputFormat provides. Hence, each line of the input file is the new value and a key is byte offset.

      SequenceFileRecordReader- It reads data specified by the header of a sequence file.

      Follow the link to learn more about RecordReader in Hadoop

    • #5969
      DataFlair TeamDataFlair Team
      Spectator

      The Mapper takes input in the form of Key-value pairs. But do think what is the format of the input data we are giving to mapper, it’s the block, which needs to be converted to (key,value) pair.How does this get converted. The answer is by using RecordReader.

      The InputFormat defines how the input files are split and read. The product is The InputSplit i.e. the logical representation of the block.This InputSplit is fed to RecordReader which converts these splits to (K,V) pair which are the actual records processed by Mapper.

      Follow the link to learn more about RecordReader in Hadoop

    • #5971
      DataFlair TeamDataFlair Team
      Spectator

      The RecordReader load’s data from source and converts into Key-value pairs which is suitable for Mapper.This defined by input format, by default use the TextInput Format to convert into key value pair.

      Types of Record Reader:
      1) LineRecordReader: TextInputFormat provides this as default Record Reader.
      2) SequenceFileRecordReader: It reads data specified by the header of a sequence file.

      Follow the link to learn more about RecordReader in Hadoop

Viewing 3 reply threads
  • You must be logged in to reply to this topic.