What is a Record Reader in hadoop?

This topic has 3 replies, 1 voice, and was last updated 7 years, 10 months ago by DataFlair Team.

Viewing 3 reply threads

Author

Posts
- September 20, 2018 at 4:51 pm #5964
  
  DataFlair Team
  Spectator
  
  Explain RecordReader. What is the need of RecordReader ?
- September 20, 2018 at 4:51 pm #5967
  
  DataFlair Team
  Spectator
  
  RecordReader communicates with the InputSplit (created by InputFormat) and converts the split into records. Records are in form of Key-value pairs that are suitable for reading by the Mapper. RecordReader communicates with the inputsplit until it does not read the complete file
  
  RecordReader uses the data within the boundaries, defined by InputSplit. At “start” RecordReader in hadoop start generating key-value pairs and the “end” is where it should stop reading records.
  The MapReduce framework defines RecordReader instance by the InputFormat. By, default; it uses TextInputFormat for converting data into key-value pairs.
  
  TextInputFormat provides 2 types of RecordReader : LineRecordReader and SequenceFileRecordReader.
  
  LineRecordReader- LineRecordReader in Hadoop is the default RecordReader that TextInputFormat provides. Hence, each line of the input file is the new value and a key is byte offset.
  
  SequenceFileRecordReader- It reads data specified by the header of a sequence file.
  
  Follow the link to learn more about RecordReader in Hadoop
- September 20, 2018 at 4:51 pm #5969
  
  DataFlair Team
  Spectator
  
  The Mapper takes input in the form of Key-value pairs. But do think what is the format of the input data we are giving to mapper, it’s the block, which needs to be converted to (key,value) pair.How does this get converted. The answer is by using RecordReader.
  
  The InputFormat defines how the input files are split and read. The product is The InputSplit i.e. the logical representation of the block.This InputSplit is fed to RecordReader which converts these splits to (K,V) pair which are the actual records processed by Mapper.
  
  Follow the link to learn more about RecordReader in Hadoop
- September 20, 2018 at 4:51 pm #5971
  
  DataFlair Team
  Spectator
  
  The RecordReader load’s data from source and converts into Key-value pairs which is suitable for Mapper.This defined by input format, by default use the TextInput Format to convert into key value pair.
  
  Types of Record Reader:
  1) LineRecordReader: TextInputFormat provides this as default Record Reader.
  2) SequenceFileRecordReader: It reads data specified by the header of a sequence file.
  
  Follow the link to learn more about RecordReader in Hadoop
Author

Posts

Viewing 3 reply threads

You must be logged in to reply to this topic.

What is a Record Reader in hadoop?

About DataFlair

Trending Courses in Indore

Trending Courses in Bangalore

Trending Courses in Chennai

Trending Courses in Pune

Trending Courses in Hyderabad

Trending Courses in Delhi NCR