Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › What is RecordReader in MapReduce?
Tagged: Data Structure, Hadoop
- This topic has 1 reply, 1 voice, and was last updated 5 years, 6 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 5:46 pm #6337DataFlair TeamSpectator
What is the purpose of RecordReader in Hadoop?
-
September 20, 2018 at 5:47 pm #6339DataFlair TeamSpectator
All Mappers and Reducers in Hadoop work only in Key-Value pairs.
RecordReader is an interface between InputSplit and Mapper.
RecordReader can read only one line at a time(by default) from the corresponding Input Split and then it converts that line/record into key-value pairs and passes it to the Mapper further for processing.
InputSplit is logical representation of unit data block processed by an Individual Mapper in Hadoop.(i.e single unit of work on which Map task is done. )
InputSplits are created by InputFormat Class.
InputSplit is user defined and the user can control split size based on the size of data(recommended 64/128 MB). The number of map tasks is equal to the number of InputSplits.
As the number of Map tasks is equal to the number of InputSplits, which in turn are created by the RecordReader that many number of times, as RecordReader can read only one Record/Line at a time.
Input File => Multiple InputSplits(Created by InputFormat Class) <=> RecordReader(Converts into (Key,Value) pair) => Mapper
Follow the link for more detail: RecordReader in Hadoop
-
-
AuthorPosts
- You must be logged in to reply to this topic.