Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › Input format runs in same JVM as record reader and mapper?
- This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 3:48 pm #5623DataFlair TeamSpectator
Input format runs in same JVM as record reader and mapper?
-
September 20, 2018 at 3:48 pm #5625DataFlair TeamSpectator
InputFormat:
InputFormat is a class responsible for creating splits (default 128 Mb) and dividing them into records.
getsplit() method is to compute the input splits and pass to map task.
Map Task passes the splits to createRecordReader() method.Record Reader and Mapper:
The large file is stored in HDFS. For processing them using MapReduce, InputSplit logically divides the large file into smaller chunks based on its size. Each input splits will be processed with each map task. So, the no. of input splits is equal to the no. of map tasks.
Both the map and reduce tasks use key-value pairs as input and output to process data.
RecordReader communicates and converts the input splits into key-value pairs. This is sent as input to the mapper. The RecordReader helps the mapper to know the start (byte offset ) and end of the record.
A mapper task process one key and one value at a time and stores the output key-value in the local disk which is then passed as input to reducer task.
Therefore, the InputFormat, RecordReader as well as Mapper run on the similar JVM. However, the reducer runs on the seperate JVM.
-
-
AuthorPosts
- You must be logged in to reply to this topic.