What is the purpose of RecordReader in Hadoop?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 3:40 pm #5567
  
  DataFlair Team
  Spectator
  
  What is RecordReader in Hadoop MapReduce?
  What is default RecordReader in Hadoop?
  What is the need of RecordReader in MaprReduce?
- September 20, 2018 at 3:40 pm #5569
  
  DataFlair Team
  Spectator
  
  Hadoop Architecture includes 2 major poles
  
  1) HDFS
  2) Mapeduce.
  
  HDFS is uses for storing the data and MapReduce is getting used for process that data. To process HDFS stored data like log or text file, programmer need to write the map reduce program. Logically while fetching the data from HDFS block, mapper expect that data in InputSplit. InputSplit is the chunk of data that need to process by one map. Each split may have multiple records, so here the RecordReader coming in a picture which again split the InpuSplit in records (Row by Row) and converts it to key-value pairs and submit it to mapper.
  
  By default Hadoop uses TextInputFormat for converting data into key-value pairs.
  
  TextInputFormat having 2 types of RecordReader:
  
  1) LineRecordReader- LineRecordReader in Hadoop is the default RecordReader which TextInputFormat provides. In this each line of the input file is the new value and key is byte offset.
  2) SequenceFileRecordReader- It reads data specified by the header of a sequence file.
  
  Follow the link to learn more about RecordReader in Hadoop
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.