Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › What is LazyOutputFormat in MapReduce?
- This topic has 3 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 4:49 pm #5950DataFlair TeamSpectator
What is LazyOutputFormat in Hadoop?
What is the need of LazyOutputFormat in Hadoop?
How to enable LazyOutputFormat in Hadoop MapReduce? -
September 20, 2018 at 4:49 pm #5952DataFlair TeamSpectator
In Hadoop, OutputFormat decides how output of the reducer (key,value pairs) is written to the Block in HDFS. The no. of files(part-nnnnn) generated will be equal to the no.of partitions. Even if some of the files are empty , still the files are generated. In some applications , it will be useful if the files having output key value pairs instead of empty files are generated.
In that case LazyOutputFormat is useful. It is a wrapper output format which makes sure that the output file is generated only when the first record is emitted from the partition. It is called by callingjob.setOutputFormat(LazyOutputFormat.class) in driver class.Streaming and pipes support -lazyOutput option to enable LazyOutputFormat.
Follow the link for more detail: MapReduce OutputFormat
-
September 20, 2018 at 4:49 pm #5954DataFlair TeamSpectator
LazyOutputFormat is a wrapper output format that ensures that the output file is created only when the data exits for a given partition.
Even there is no records written in context for standard output, the map reduce framework will create a zero byte file( eg part-00000) in the output directory.By adding the following setting in the driver, instead of job.setOutputFormatClass(TextOutputFormat.class); in Hadoop job configurationLazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
Empty files are avoided.
Follow the link for more detail: MapReduce OutputFormat
-
September 20, 2018 at 4:50 pm #5956DataFlair TeamSpectator
In Hadoop FileOutputFormat subclasses will create output (part-r-nnnnn) files, even if they are empty.To overcome the situation
LazyOutputFormat is used to ensure that your output files are only created when there is some data and not to initialize empty files.To use lazy output format you have to import it from
org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat package.
and in your driver class(where job is configured) add following code
LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class).
Follow the link for more detail: MapReduce OutputFormat
-
-
AuthorPosts
- You must be logged in to reply to this topic.