What is LazyOutputFormat in MapReduce?

This topic has 3 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 3 reply threads

Author

Posts
- September 20, 2018 at 4:49 pm #5950
  
  DataFlair Team
  Spectator
  
  What is LazyOutputFormat in Hadoop?
  What is the need of LazyOutputFormat in Hadoop?
  How to enable LazyOutputFormat in Hadoop MapReduce?
- September 20, 2018 at 4:49 pm #5952
  
  DataFlair Team
  Spectator
  
  In Hadoop, OutputFormat decides how output of the reducer (key,value pairs) is written to the Block in HDFS. The no. of files(part-nnnnn) generated will be equal to the no.of partitions. Even if some of the files are empty , still the files are generated. In some applications , it will be useful if the files having output key value pairs instead of empty files are generated.
  In that case LazyOutputFormat is useful. It is a wrapper output format which makes sure that the output file is generated only when the first record is emitted from the partition. It is called by callingjob.setOutputFormat(LazyOutputFormat.class) in driver class.
  
  Streaming and pipes support -lazyOutput option to enable LazyOutputFormat.
  
  Follow the link for more detail: MapReduce OutputFormat
- September 20, 2018 at 4:49 pm #5954
  
  DataFlair Team
  Spectator
  
  LazyOutputFormat is a wrapper output format that ensures that the output file is created only when the data exits for a given partition.
  Even there is no records written in context for standard output, the map reduce framework will create a zero byte file( eg part-00000) in the output directory.By adding the following setting in the driver, instead of job.setOutputFormatClass(TextOutputFormat.class); in Hadoop job configuration
  
  LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
  
  Empty files are avoided.
  
  Follow the link for more detail: MapReduce OutputFormat
- September 20, 2018 at 4:50 pm #5956
  DataFlair Team
  Spectator
  In Hadoop FileOutputFormat subclasses will create output (part-r-nnnnn) files, even if they are empty.To overcome the situation
  LazyOutputFormat is used to ensure that your output files are only created when there is some data and not to initialize empty files.
  
  To use lazy output format you have to import it from
```
org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat package.
```
  and in your driver class(where job is configured) add following code
```
LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class).
```
  Follow the link for more detail: MapReduce OutputFormat
Author

Posts

Viewing 3 reply threads

You must be logged in to reply to this topic.

What is LazyOutputFormat in MapReduce?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses