This topic contains 3 replies, has 1 voice, and was last updated by  dfbdteam3 1 year, 6 months ago.

Viewing 4 posts - 1 through 4 (of 4 total)
  • Author
    Posts
  • #5964

    dfbdteam3
    Moderator

    Explain RecordReader. What is the need of RecordReader ?

    #5967

    dfbdteam3
    Moderator

    RecordReader communicates with the InputSplit (created by InputFormat) and converts the split into records. Records are in form of Key-value pairs that are suitable for reading by the Mapper. RecordReader communicates with the inputsplit until it does not read the complete file

    RecordReader uses the data within the boundaries, defined by InputSplit. At “start” RecordReader in hadoop start generating key-value pairs and the “end” is where it should stop reading records.
    The MapReduce framework defines RecordReader instance by the InputFormat. By, default; it uses TextInputFormat for converting data into key-value pairs.

    TextInputFormat provides 2 types of RecordReader : LineRecordReader and SequenceFileRecordReader.

    LineRecordReader- LineRecordReader in Hadoop is the default RecordReader that TextInputFormat provides. Hence, each line of the input file is the new value and a key is byte offset.

    SequenceFileRecordReader- It reads data specified by the header of a sequence file.

    Follow the link to learn more about RecordReader in Hadoop

    #5969

    dfbdteam3
    Moderator

    The Mapper takes input in the form of Key-value pairs. But do think what is the format of the input data we are giving to mapper, it’s the block, which needs to be converted to (key,value) pair.How does this get converted. The answer is by using RecordReader.

    The InputFormat defines how the input files are split and read. The product is The InputSplit i.e. the logical representation of the block.This InputSplit is fed to RecordReader which converts these splits to (K,V) pair which are the actual records processed by Mapper.

    Follow the link to learn more about RecordReader in Hadoop

    #5971

    dfbdteam3
    Moderator

    The RecordReader load’s data from source and converts into Key-value pairs which is suitable for Mapper.This defined by input format, by default use the TextInput Format to convert into key value pair.

    Types of Record Reader:
    1) LineRecordReader: TextInputFormat provides this as default Record Reader.
    2) SequenceFileRecordReader: It reads data specified by the header of a sequence file.

    Follow the link to learn more about RecordReader in Hadoop

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic.