What is RecordReader in MapReduce?

Viewing 1 reply thread
  • Author
    Posts
    • #6337
      DataFlair TeamDataFlair Team
      Spectator

      What is the purpose of RecordReader in Hadoop?

    • #6339
      DataFlair TeamDataFlair Team
      Spectator

      All Mappers and Reducers in Hadoop work only in Key-Value pairs.

      RecordReader is an interface between InputSplit and Mapper.

      RecordReader can read only one line at a time(by default) from the corresponding Input Split and then it converts that line/record into key-value pairs and passes it to the Mapper further for processing.

      InputSplit is logical representation of unit data block processed by an Individual Mapper in Hadoop.(i.e single unit of work on which Map task is done. )

      InputSplits are created by InputFormat Class.

      InputSplit is user defined and the user can control split size based on the size of data(recommended 64/128 MB). The number of map tasks is equal to the number of InputSplits.

      As the number of Map tasks is equal to the number of InputSplits, which in turn are created by the RecordReader that many number of times, as RecordReader can read only one Record/Line at a time.

      Input File => Multiple InputSplits(Created by InputFormat Class) <=> RecordReader(Converts into (Key,Value) pair) => Mapper

      Follow the link for more detail: RecordReader in Hadoop

Viewing 1 reply thread
  • You must be logged in to reply to this topic.