What is Mapper in Hadoop?

This topic has 2 replies, 1 voice, and was last updated 7 years, 10 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 3:37 pm #5558
  
  DataFlair Team
  Spectator
  
  What is the purpose of Mapper in Hadoop?
  What is Mapper in MapReduce?
  How Mapper works in Hadoop MapReduce?
- September 20, 2018 at 3:37 pm #5561
  
  DataFlair Team
  Spectator
  
  Mapper in Hadoop takes each record generated by the RecordReader as input. Then processes each record and generates key-value pairs. This key-value pair is completely different from the input pair. The mapper output is known as intermediate output which is stored on the local disk. Mapper does not store its output on HDFS, as it is temporary data and storing on HDFS will create multiple copies.
  
  Before storing mapper output on the local disk, partitioning of output takes place on the basis of the key and then sorting is done. This partitioning specifies that all the value for each key is grouped together. Mapper in hadoop only understands key-value pairs of data. So data should be converted into key-value pair before passing to the mapper. Data is converted into key-value pairs by InputSplit and RecordReader.
  
  1) InputSplit- InputFormat generates InputSplit which is the logical representation of data. MapReduce framework generates one map task for each Inputsplit.
  
  2) RecordReader- It communicates with InputSplit and converts data into key-value pairs.
  
  How many mappers?
  It depends on the total size of the input, i.e. the total number of blocks of the input files.
  
  Mapper= {(total data size)/ (input split size)}
  If data size= 1 Tb and input split size= 100 MB
  Mapper= (1000*1000)/100= 10,000
  Follow the link to learn more about Mapper in Hadoop
- September 20, 2018 at 3:37 pm #5563
  
  DataFlair Team
  Spectator
  
  1 ) What is the purpose of Mapper in Hadoop?
  
  In hadoop , mapper is used to convert the input split in key-value pairs .There will be the one mapper for each data block on HDFS.
  
  2 ) What is Mapper in MapReduce?
  
  Mapper is the user define program which manipulate the input split in (key, value) pairs as per the code design. Typically Mapper is the base class which need to extend by programmer to write their own logic as per requirement. While extending mapper, programmer need to mention input and output type under mapper class arguments.
  
  Example;
  
  Class MyMappper extends Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
  
  3 ) How Mapper works in Hadoop MapReduce?
  
  Mapper is the first code which is responsible to migrate/ manipulate the HDFS block stored data into key and value pair. Hadoop assign one map program to individually one blocks i.e. if my data is on 20 blocks then 20 map program will run parallel and the mapper output will getting store on local disk. After parsing the all data blocks content in key value pair by mapper, then hadoop serves this output to reducer as input. After reducer operation final output getting store in HDFS again.
  
  For more detail pease follow Mapper in Hadoop
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

What is Mapper in Hadoop?

About DataFlair

Trending Courses in Indore

Trending Courses in Bangalore

Trending Courses in Chennai

Trending Courses in Pune

Trending Courses in Hyderabad

Trending Courses in Delhi NCR