This topic contains 2 replies, has 1 voice, and was last updated by  dfbdteam3 1 year, 6 months ago.

Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
    Posts
  • #5947

    dfbdteam3
    Moderator

    What is LazyOutputFormat in Hadoop MapReduce?
    What is the need of LazyOutputFormat in Hadoop?
    How to enable LazyOutputFormat in Hadoop MapReduce?

    #5948

    dfbdteam3
    Moderator

    Reducer takes mapper output as input and produces output (zero or more key-value pair). RecordWriter writes these output key-value pair from the Reducer phase to output files. So, OutputFormat determines, how RecordWriter writes these key-value pairs in Output files

    FileOutputFormat subclasses will create output files (part-r-nnnn), even if they are empty. Some applications prefer not to create empty files, which is where LazyOutputFormat helps.

    LazyOutputFormat is a wrapper OutputFormat. It makes sure that the output file should create only when it emits its first record for a given partition.

    To use LazyOutputFormat, call its SetOutputFormatClass() method with the JobConf.

    To enable LazyOutputFormat, streaming and pipes supports a – lazyOutput option.

    #5949

    dfbdteam3
    Moderator

    OutputFormat class defines how the (key-value pair) are written to HDFS. The output may be zero or many (key,value) pairs. In both cases, output files will be created i.e either empty or file with output. Some application prefers the empty files not be generated.Here LazyOuptutFormat helps, the class ensures that the output file is created only when the first record is emitted for a given partition. To use it, call its setOutputFormatClass() method with the JobConf and the underlying output format. Streaming supports a -lazyOutput option to enable LazyOutputFormat.

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic.