What is LazyOutputFormat in Hadoop?

Viewing 2 reply threads
  • Author
    Posts
    • #5947
      DataFlair TeamDataFlair Team
      Spectator

      What is LazyOutputFormat in Hadoop MapReduce?
      What is the need of LazyOutputFormat in Hadoop?
      How to enable LazyOutputFormat in Hadoop MapReduce?

    • #5948
      DataFlair TeamDataFlair Team
      Spectator

      Reducer takes mapper output as input and produces output (zero or more key-value pair). RecordWriter writes these output key-value pair from the Reducer phase to output files. So, OutputFormat determines, how RecordWriter writes these key-value pairs in Output files

      FileOutputFormat subclasses will create output files (part-r-nnnn), even if they are empty. Some applications prefer not to create empty files, which is where LazyOutputFormat helps.

      LazyOutputFormat is a wrapper OutputFormat. It makes sure that the output file should create only when it emits its first record for a given partition.

      To use LazyOutputFormat, call its SetOutputFormatClass() method with the JobConf.

      To enable LazyOutputFormat, streaming and pipes supports a – lazyOutput option.

    • #5949
      DataFlair TeamDataFlair Team
      Spectator

      OutputFormat class defines how the (key-value pair) are written to HDFS. The output may be zero or many (key,value) pairs. In both cases, output files will be created i.e either empty or file with output. Some application prefers the empty files not be generated.Here LazyOuptutFormat helps, the class ensures that the output file is created only when the first record is emitted for a given partition. To use it, call its setOutputFormatClass() method with the JobConf and the underlying output format. Streaming supports a -lazyOutput option to enable LazyOutputFormat.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.