How many instance of Record Reader will run for a specific map reduce job?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 5:19 pm #6195
  
  DataFlair Team
  Spectator
  
  How many instance of Record Reader will run for a specific map reduce job?
- September 20, 2018 at 5:20 pm #6197
  
  DataFlair Team
  Spectator
  
  The InputFormat defines the data split i.e. logical division of data. But the actual read of data is done by the RecordReader.
  
  RecordReader generates the key-value pair from the split which is given as input to the Map task.
  
  public abstract RecordReader<K, V>
  createRecordReader(InputSplit split, TaskAttemptContext context)
  throws IOException, InterruptedException;
  
  The split is calculated by getSplit(), the map task pass the split to createRecordReader() method on InputFormat to get the key-value pair which is passed and processed by Mapper function.
  
  If there are N splits it would use N RecordReader intstance and N Map task to process the same
  
  Follow the link for more detail: RecordReader in Hadoop
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.