What is Chain Mapper?

Viewing 1 reply thread
  • Author
    Posts
    • #5984
      DataFlair TeamDataFlair Team
      Spectator

      Explain Chain Mapper in MapReduce?

    • #5986
      DataFlair TeamDataFlair Team
      Spectator

      ChainMapper is a class which is defined in org.apache.hadoop.mapreduce.lib.chain.ChainMapper package. This class can be used to run multiple Mapper in a single map task. All mappers are run in a chain fashioned, the output of the first mapper becomes the input of the second mapper, the output of the second mapper becomes the input of the third mapper, and so on until the last mapper. The last mapper output is written to intermediate files (Or in-memory if the complete data is less 80/100 MB).

      We no need to specify the output key-valueclasses as part of job.setMapOutputKeyClass() and job.setMapOutputValueClass() method. Strictly speaking, these methods are wrapped as part of addMapper() method of ChainMapper class. You can look at the implementation of addMapper() method in grepcode.com website.

      Mapper Output key and value class types can be specified as part of the static addMapper() method of ChainMapper class.

      MapReduce Driver Code:

      Configuration conf = new Configuration();
      Job job = new new Job(conf);
      Configuration mapConf1 = new Configuration(false);
      ChainMapper.addMapper(job, MyMapper1.class,
      LongWritable.class, Text.class, Text.class,Text.class);

      Configuration mapConf2 = new Configuration(false);
      ChainMapper.addMapper(job, MyMapper2.class,
      Text.class, Text.class, Text.class, Text.class);

Viewing 1 reply thread
  • You must be logged in to reply to this topic.