What is RecordReader in MapReduce?

Tagged: Data Structure, Hadoop

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 5:46 pm #6337
  
  DataFlair Team
  Spectator
  
  What is the purpose of RecordReader in Hadoop?
- September 20, 2018 at 5:47 pm #6339
  
  DataFlair Team
  Spectator
  
  All Mappers and Reducers in Hadoop work only in Key-Value pairs.
  
  RecordReader is an interface between InputSplit and Mapper.
  
  RecordReader can read only one line at a time(by default) from the corresponding Input Split and then it converts that line/record into key-value pairs and passes it to the Mapper further for processing.
  
  InputSplit is logical representation of unit data block processed by an Individual Mapper in Hadoop.(i.e single unit of work on which Map task is done. )
  
  InputSplits are created by InputFormat Class.
  
  InputSplit is user defined and the user can control split size based on the size of data(recommended 64/128 MB). The number of map tasks is equal to the number of InputSplits.
  
  As the number of Map tasks is equal to the number of InputSplits, which in turn are created by the RecordReader that many number of times, as RecordReader can read only one Record/Line at a time.
  
  Input File => Multiple InputSplits(Created by InputFormat Class) <=> RecordReader(Converts into (Key,Value) pair) => Mapper
  
  Follow the link for more detail: RecordReader in Hadoop
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.