What is the purpose of RecordReader in Hadoop?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop What is the purpose of RecordReader in Hadoop?

Viewing 1 reply thread
  • Author
    Posts
    • #5567
      DataFlair TeamDataFlair Team
      Spectator

      What is RecordReader in Hadoop MapReduce?
      What is default RecordReader in Hadoop?
      What is the need of RecordReader in MaprReduce?

    • #5569
      DataFlair TeamDataFlair Team
      Spectator

      Hadoop Architecture includes 2 major poles

      1) HDFS
      2) Mapeduce.

      HDFS is uses for storing the data and MapReduce is getting used for process that data. To process HDFS stored data like log or text file, programmer need to write the map reduce program. Logically while fetching the data from HDFS block, mapper expect that data in InputSplit. InputSplit is the chunk of data that need to process by one map. Each split may have multiple records, so here the RecordReader coming in a picture which again split the InpuSplit in records (Row by Row) and converts it to key-value pairs and submit it to mapper.

      By default Hadoop uses TextInputFormat for converting data into key-value pairs.

      TextInputFormat having 2 types of RecordReader:

      1) LineRecordReader- LineRecordReader in Hadoop is the default RecordReader which TextInputFormat provides. In this each line of the input file is the new value and key is byte offset.
      2) SequenceFileRecordReader- It reads data specified by the header of a sequence file.

      Follow the link to learn more about RecordReader in Hadoop

Viewing 1 reply thread
  • You must be logged in to reply to this topic.