Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › What is LazyOutputFormat in Hadoop?
- This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 4:49 pm #5947DataFlair TeamSpectator
What is LazyOutputFormat in Hadoop MapReduce?
What is the need of LazyOutputFormat in Hadoop?
How to enable LazyOutputFormat in Hadoop MapReduce? -
September 20, 2018 at 4:49 pm #5948DataFlair TeamSpectator
Reducer takes mapper output as input and produces output (zero or more key-value pair). RecordWriter writes these output key-value pair from the Reducer phase to output files. So, OutputFormat determines, how RecordWriter writes these key-value pairs in Output files
FileOutputFormat subclasses will create output files (part-r-nnnn), even if they are empty. Some applications prefer not to create empty files, which is where LazyOutputFormat helps.
LazyOutputFormat is a wrapper OutputFormat. It makes sure that the output file should create only when it emits its first record for a given partition.
To use LazyOutputFormat, call its SetOutputFormatClass() method with the JobConf.
To enable LazyOutputFormat, streaming and pipes supports a – lazyOutput option.
-
September 20, 2018 at 4:49 pm #5949DataFlair TeamSpectator
OutputFormat class defines how the (key-value pair) are written to HDFS. The output may be zero or many (key,value) pairs. In both cases, output files will be created i.e either empty or file with output. Some application prefers the empty files not be generated.Here LazyOuptutFormat helps, the class ensures that the output file is created only when the first record is emitted for a given partition. To use it, call its setOutputFormatClass() method with the JobConf and the underlying output format. Streaming supports a -lazyOutput option to enable LazyOutputFormat.
-
-
AuthorPosts
- You must be logged in to reply to this topic.