What is LazyOutputFormat in MapReduce?

Viewing 3 reply threads
  • Author
    Posts
    • #5950
      DataFlair TeamDataFlair Team
      Spectator

      What is LazyOutputFormat in Hadoop?
      What is the need of LazyOutputFormat in Hadoop?
      How to enable LazyOutputFormat in Hadoop MapReduce?

    • #5952
      DataFlair TeamDataFlair Team
      Spectator

      In Hadoop, OutputFormat decides how output of the reducer (key,value pairs) is written to the Block in HDFS. The no. of files(part-nnnnn) generated will be equal to the no.of partitions. Even if some of the files are empty , still the files are generated. In some applications , it will be useful if the files having output key value pairs instead of empty files are generated.
      In that case LazyOutputFormat is useful. It is a wrapper output format which makes sure that the output file is generated only when the first record is emitted from the partition. It is called by callingjob.setOutputFormat(LazyOutputFormat.class) in driver class.

      Streaming and pipes support -lazyOutput option to enable LazyOutputFormat.

      Follow the link for more detail: MapReduce OutputFormat

    • #5954
      DataFlair TeamDataFlair Team
      Spectator

      LazyOutputFormat is a wrapper output format that ensures that the output file is created only when the data exits for a given partition.
      Even there is no records written in context for standard output, the map reduce framework will create a zero byte file( eg part-00000) in the output directory.By adding the following setting in the driver, instead of job.setOutputFormatClass(TextOutputFormat.class); in Hadoop job configuration

      LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);

      Empty files are avoided.

      Follow the link for more detail: MapReduce OutputFormat

    • #5956
      DataFlair TeamDataFlair Team
      Spectator

      In Hadoop FileOutputFormat subclasses will create output (part-r-nnnnn) files, even if they are empty.To overcome the situation
      LazyOutputFormat is used to ensure that your output files are only created when there is some data and not to initialize empty files.

      To use lazy output format you have to import it from

      org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat package.

      and in your driver class(where job is configured) add following code

      LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class).

      Follow the link for more detail: MapReduce OutputFormat

Viewing 3 reply threads
  • You must be logged in to reply to this topic.