How many instance of Record Reader will run for a specific map reduce job?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop How many instance of Record Reader will run for a specific map reduce job?

Viewing 1 reply thread
  • Author
    Posts
    • #6195
      DataFlair TeamDataFlair Team
      Spectator

      How many instance of Record Reader will run for a specific map reduce job?

    • #6197
      DataFlair TeamDataFlair Team
      Spectator

      The InputFormat defines the data split i.e. logical division of data. But the actual read of data is done by the RecordReader.

      RecordReader generates the key-value pair from the split which is given as input to the Map task.

      public abstract RecordReader<K, V>
      createRecordReader(InputSplit split, TaskAttemptContext context)
      throws IOException, InterruptedException;

      The split is calculated by getSplit(), the map task pass the split to createRecordReader() method on InputFormat to get the key-value pair which is passed and processed by Mapper function.

      If there are N splits it would use N RecordReader intstance and N Map task to process the same

      Follow the link for more detail: RecordReader in Hadoop

Viewing 1 reply thread
  • You must be logged in to reply to this topic.