Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › What is a Record Reader in hadoop?
- This topic has 3 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 4:51 pm #5964DataFlair TeamSpectator
Explain RecordReader. What is the need of RecordReader ?
-
September 20, 2018 at 4:51 pm #5967DataFlair TeamSpectator
RecordReader communicates with the InputSplit (created by InputFormat) and converts the split into records. Records are in form of Key-value pairs that are suitable for reading by the Mapper. RecordReader communicates with the inputsplit until it does not read the complete file
RecordReader uses the data within the boundaries, defined by InputSplit. At “start” RecordReader in hadoop start generating key-value pairs and the “end” is where it should stop reading records.
The MapReduce framework defines RecordReader instance by the InputFormat. By, default; it uses TextInputFormat for converting data into key-value pairs.TextInputFormat provides 2 types of RecordReader : LineRecordReader and SequenceFileRecordReader.
LineRecordReader- LineRecordReader in Hadoop is the default RecordReader that TextInputFormat provides. Hence, each line of the input file is the new value and a key is byte offset.
SequenceFileRecordReader- It reads data specified by the header of a sequence file.
Follow the link to learn more about RecordReader in Hadoop
-
September 20, 2018 at 4:51 pm #5969DataFlair TeamSpectator
The Mapper takes input in the form of Key-value pairs. But do think what is the format of the input data we are giving to mapper, it’s the block, which needs to be converted to (key,value) pair.How does this get converted. The answer is by using RecordReader.
The InputFormat defines how the input files are split and read. The product is The InputSplit i.e. the logical representation of the block.This InputSplit is fed to RecordReader which converts these splits to (K,V) pair which are the actual records processed by Mapper.
Follow the link to learn more about RecordReader in Hadoop
-
September 20, 2018 at 4:51 pm #5971DataFlair TeamSpectator
The RecordReader load’s data from source and converts into Key-value pairs which is suitable for Mapper.This defined by input format, by default use the TextInput Format to convert into key value pair.
Types of Record Reader:
1) LineRecordReader: TextInputFormat provides this as default Record Reader.
2) SequenceFileRecordReader: It reads data specified by the header of a sequence file.Follow the link to learn more about RecordReader in Hadoop
-
-
AuthorPosts
- You must be logged in to reply to this topic.