What is Mapper in Hadoop?

Viewing 2 reply threads
  • Author
    • #5558
      DataFlair TeamDataFlair Team

      What is the purpose of Mapper in Hadoop?
      What is Mapper in MapReduce?
      How Mapper works in Hadoop MapReduce?

    • #5561
      DataFlair TeamDataFlair Team

      Mapper in Hadoop takes each record generated by the RecordReader as input. Then processes each record and generates key-value pairs. This key-value pair is completely different from the input pair. The mapper output is known as intermediate output which is stored on the local disk. Mapper does not store its output on HDFS, as it is temporary data and storing on HDFS will create multiple copies.

      Before storing mapper output on the local disk, partitioning of output takes place on the basis of the key and then sorting is done. This partitioning specifies that all the value for each key is grouped together. Mapper in hadoop only understands key-value pairs of data. So data should be converted into key-value pair before passing to the mapper. Data is converted into key-value pairs by InputSplit and RecordReader.

      1) InputSplit- InputFormat generates InputSplit which is the logical representation of data. MapReduce framework generates one map task for each Inputsplit.

      2) RecordReader- It communicates with InputSplit and converts data into key-value pairs.

      How many mappers?
      It depends on the total size of the input, i.e. the total number of blocks of the input files.

      Mapper= {(total data size)/ (input split size)}
      If data size= 1 Tb and input split size= 100 MB
      Mapper= (1000*1000)/100= 10,000
      Follow the link to learn more about Mapper in Hadoop

    • #5563
      DataFlair TeamDataFlair Team

      1 ) What is the purpose of Mapper in Hadoop?

      In hadoop , mapper is used to convert the input split in key-value pairs .There will be the one mapper for each data block on HDFS.

      2 ) What is Mapper in MapReduce?

      Mapper is the user define program which manipulate the input split in (key, value) pairs as per the code design. Typically Mapper is the base class which need to extend by programmer to write their own logic as per requirement. While extending mapper, programmer need to mention input and output type under mapper class arguments.


      Class MyMappper extends Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

      3 ) How Mapper works in Hadoop MapReduce?

      Mapper is the first code which is responsible to migrate/ manipulate the HDFS block stored data into key and value pair. Hadoop assign one map program to individually one blocks i.e. if my data is on 20 blocks then 20 map program will run parallel and the mapper output will getting store on local disk. After parsing the all data blocks content in key value pair by mapper, then hadoop serves this output to reducer as input. After reducer operation final output getting store in HDFS again.

      For more detail pease follow Mapper in Hadoop

Viewing 2 reply threads
  • You must be logged in to reply to this topic.