Input format runs in same JVM as record reader and mapper?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop Input format runs in same JVM as record reader and mapper?

Viewing 1 reply thread
  • Author
    Posts
    • #5623
      DataFlair TeamDataFlair Team
      Spectator

      Input format runs in same JVM as record reader and mapper?

    • #5625
      DataFlair TeamDataFlair Team
      Spectator

      InputFormat:

      InputFormat is a class responsible for creating splits (default 128 Mb) and dividing them into records.
      getsplit() method is to compute the input splits and pass to map task.
      Map Task passes the splits to createRecordReader() method.

      Record Reader and Mapper:

      The large file is stored in HDFS. For processing them using MapReduce, InputSplit logically divides the large file into smaller chunks based on its size. Each input splits will be processed with each map task. So, the no. of input splits is equal to the no. of map tasks.

      Both the map and reduce tasks use key-value pairs as input and output to process data.

      RecordReader communicates and converts the input splits into key-value pairs. This is sent as input to the mapper. The RecordReader helps the mapper to know the start (byte offset ) and end of the record.

      mapper task process one key and one value at a time and stores the output key-value in the local disk which is then passed as input to reducer task.

      Therefore, the InputFormat, RecordReader as well as Mapper run on the similar JVM. However, the reducer runs on the seperate JVM.

Viewing 1 reply thread
  • You must be logged in to reply to this topic.