What is identity mapper and reducer?

This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 2:34 pm #5207
  
  DataFlair Team
  Spectator
  
  What is identity mapper and reducer?
  In which cases can we use them, please explain with an example..
- September 20, 2018 at 2:34 pm #5208
  
  DataFlair Team
  Spectator
  
  Identity Mappers and Reducers don’t have a body, it only generates key-value pairs and the package is org.apache.hadoop.mapred.identity.
  
  Identity Mapper and Reducer just like the concept of Identity function in mathematics i.e. do not transform the input and return it as it is in output form.
  
  1) Identity Mapper takes the input key/value pair and splits it out without any processing.
  
  2) Identity reducer is a bit different. It does not mean that the reduce step will not take place. It will take place and the related sorting/shuffling will also be performed but there will be no aggregation. So you can use identity reducer if you want to sort your data that is coming from the map but don’t care for any grouping.
  
  Hope u get it! 🙂
- September 20, 2018 at 2:34 pm #5210
  DataFlair Team
  Spectator
  Hadoop provides few predefined Mappers and Reducers classes. There classes are useful for default Mapreduce jobs. We can say predefined Mapper and Reducer classes are also known as Identity Mapper and Reducer classes. These classes are available in org.apache.hadoop.mapred.lib package.
  
  Along with these classes, Hadoop provides some more predefined classes.
  
  List of predefined Mapper classes:
  1) Identity Mapper
  2) Inverse Mapper
  3) Token Counter
  4) Regex Mapper
  5) Chain Mapper
  
  List of predefined Reducer classes:
  1) Identity Reducer
  2) IntSum Reeducer
  3) LongSum Reducer
  3) Chain Reducer
  
  Identity Mapper is the default mapper class which is provided by Hadoop. Identity Mapper class is a generic class and it can be used with any key-value pairs data types. When you submitted MR JOB, this class will be invoked automatically when no mapper class is specified in MR Driver class. Below is the code available in IdentityMapper class.
```
public class IdentityMapper<K, V>
    extends MapReduceBase implements Mapper<K, V, K, V> {

  /** The identify function.  Input key/value pair is written directly to
   * output.*/
  public void map(K key, V val,
                  OutputCollector<K, V> output, Reporter reporter)
    throws IOException {
    output.collect(key, val);
  }
}
```
  When you look the implementation of map method, it starts dumping the key,value pairs into OutputCollector(Add key,value pairs to output which is an intermediate output(s) –Written in CircularBuffer). Here the key value is the byte offset(cursor position) and the value is the complete line (The default InputFormat is TextInputFormat. For this, the default implementation of RecordReader is LineRecordReader class. What this class say is: it start reading the data from the logical input split, while reading the data from inputsplit, whenever it encounter \n(new line) , it stop reading the data, before \n whatever the data it read is consider as a value). This key and value is given as input to map method of IdentityMapper class. In this map method, it start writing the key and value onto the output (CircularBuffer –each mapper will have this buffer). Once map completes. Then again the same procedure is repeater (reading the data from next byteoffset value).
  
  IdentityMapper class is defined in old MR API. Map input and output key/value data types must be of same type. Look at the map implementation, you can come to know why we need same types.
  
  Identity Reducer is the default reducer class provided by Hadoop. When you submitted MR JOB, this class will be invoked automatically when no reducer class is specified in MR Driver class. Below is the code available in IdentityReducer class.
```
public class IdentityReducer<K, V>
    extends MapReduceBase implements Reducer<K, V, K, V> {

  /** Writes all keys and values directly to output. */
  public void reduce(K key, Iterator<V> values,
                     OutputCollector<K, V> output, Reporter reporter)
    throws IOException {
    while (values.hasNext()) {
      output.collect(key, values.next());
    }
  }

}
```
  When you look the implementation of reduce method above, it simply writes all its input key,value pairs data into OutputCollector(On HDFS). IdentityReducer class is defined in old MR API. This class doesn’t perform any processing on data and it simply writes all its input data into output (On HDFS).
  
  Here is the sample:
```
package com.dataflair.hr.kpi1;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class IdentityMapReduce {

		public static void main(String[] args) throws Exception{
			Configuration conf = new Configuration();
			Job job = new Job(conf, "Identity Map and Reduce Job-1");
			String[] otherArgs = new GenericOptionsParser(args).getRemainingArgs();
			if(otherArgs.length != 2) {
				System.err.println("Identity Job: Usage <in> <out>");
				System.exit(2);
			}
			job.setJarByClass(IdentityMapReduce.class);

			FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
			FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
			System.exit(job.waitForCompletion(true) ? 0 : 1);

		}
	}
```
  To run MR Job:
  yarn jar IdentityJobJar.jar com.dataflair.hr.kpi1.IdentityMapReduce emp-ctc emp-ctc-out
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

What is identity mapper and reducer?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses