This topic contains 1 reply, has 1 voice, and was last updated by  dfbdteam3 1 year, 8 months ago.

Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
  • #6337


    What is the purpose of RecordReader in Hadoop?



    All Mappers and Reducers in Hadoop work only in Key-Value pairs.

    RecordReader is an interface between InputSplit and Mapper.

    RecordReader can read only one line at a time(by default) from the corresponding Input Split and then it converts that line/record into key-value pairs and passes it to the Mapper further for processing.

    InputSplit is logical representation of unit data block processed by an Individual Mapper in Hadoop.(i.e single unit of work on which Map task is done. )

    InputSplits are created by InputFormat Class.

    InputSplit is user defined and the user can control split size based on the size of data(recommended 64/128 MB). The number of map tasks is equal to the number of InputSplits.

    As the number of Map tasks is equal to the number of InputSplits, which in turn are created by the RecordReader that many number of times, as RecordReader can read only one Record/Line at a time.

    Input File => Multiple InputSplits(Created by InputFormat Class) <=> RecordReader(Converts into (Key,Value) pair) => Mapper

    Follow the link for more detail: RecordReader in Hadoop

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.