Input format runs in same JVM as record reader and mapper?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 3:48 pm #5623
  
  DataFlair Team
  Spectator
  
  Input format runs in same JVM as record reader and mapper?
- September 20, 2018 at 3:48 pm #5625
  
  DataFlair Team
  Spectator
  
  InputFormat:
  
  InputFormat is a class responsible for creating splits (default 128 Mb) and dividing them into records.
  getsplit() method is to compute the input splits and pass to map task.
  Map Task passes the splits to createRecordReader() method.
  
  Record Reader and Mapper:
  
  The large file is stored in HDFS. For processing them using MapReduce, InputSplit logically divides the large file into smaller chunks based on its size. Each input splits will be processed with each map task. So, the no. of input splits is equal to the no. of map tasks.
  
  Both the map and reduce tasks use key-value pairs as input and output to process data.
  
  RecordReader communicates and converts the input splits into key-value pairs. This is sent as input to the mapper. The RecordReader helps the mapper to know the start (byte offset ) and end of the record.
  
  A mapper task process one key and one value at a time and stores the output key-value in the local disk which is then passed as input to reducer task.
  
  Therefore, the InputFormat, RecordReader as well as Mapper run on the similar JVM. However, the reducer runs on the seperate JVM.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.