What are the most common OutputFormat in Hadoop?

This topic has 2 replies, 1 voice, and was last updated 5 years, 6 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 5:23 pm #6216
  
  DataFlair Team
  Spectator
  
  What is OutputFormat in Hadoop MapReduce?
  How many types of OutputFormat is there in Hadoop?
  What are the different types of OutputFormat in MapReduce?
- September 20, 2018 at 5:23 pm #6217
  
  DataFlair Team
  Spectator
  
  The default OutputFormat in hadoop is TextOuputFormat. If the file output format is not specified explicitly, then text files are created as output files.
  TextOutputFormat: It writes out records, one per line, by converting keys and values to strings and separating them with a tab character.The tab-separated output is a feature of TextOutputFormat.
  
  Follow the link to learn more about OutputFormat in Hadoop
- September 20, 2018 at 5:23 pm #6219
  DataFlair Team
  Spectator
  Hadoop provides output formats that corresponding to each input format. All hadoop output formats must implement the interface org.apache.hadoop.mapreduce.OutputFormat.
  
  OutputFormat describes the output-specification for a Map-Reduce job. Based on Output specification,
  
  MapReduce job checks that the output directory doesn’t already exist.
  
  OutputFormat provides the RecordWriter implementation to be used to write out the output files of the job.
  
  These two requirements of the OutputFormat are accomplished with below two methods in the interface.
```
public abstract void checkOutputSpecs(JobContext context)
 throws IOException, InterruptedException
{
}
1
2
3
4

public abstract void checkOutputSpecs(JobContext context)
 throws IOException, InterruptedException
{
}
```
  This method checks that output directory doesn’t exist already and throws an exception when it already exists, so that output is not overwritten.
```
public abstract RecordWriter<K,V> getRecordWriter
(TaskAttemptContext context) throws
IOException, InterruptedException
{
}
1
2
3
4

public abstract RecordWriter<K,V> getRecordWriter
(TaskAttemptContext context) throws
 IOException, InterruptedException
{
}
```
  This method Gets the RecordWriter for the given task.
  
  org.apache.hadoop.mapreduce.RecordWriter<K,V> class implementations are used to write the output <key, value> pairs to an output file.
  
  Built-In Hadoop Output Formats
  
  Hadoop provided some built in InputFormat implementations in the org.apache.hadoop.mapreduce.lib.output package:
  
  FileOutputFormat
  
  Base class for all file-based OutputFormat implementations.
  
  Some of the important sub classes of the FileOutputFormat class are:
  
  TextOutputFormat
  
  The default output format provided by hadoop is TextOuputFormat and it writes records as lines of text. If file output format is not specified explicitly, then text files are created as output files.
  
  Output Key-value pairs can be of any format because TextOutputFormat converts these into strings with toString() method. Output key-value pairs are tab delimited by default.
  For reading these output text files as input, KeyValueTextInputFormat is best suitable, since it breaks input lines into key value pairs based on a separator character.
  
  SequenceFileOutputFormat
  
  This output format class is useful to write out sequence files which is a best option when the output files need to be fed into another mapreduce jobs as input files, since these are compressed and compact.
  
  SequenceFileAsBinaryOutputFormat
  
  SequenceFileAsBinaryOutputFormat is a direct subclass of SequenceFileOutputFormat and it is counter part for SequenceFileAsBinaryInputFormat. It writes keys and values to Sequence Files in binary format.
  
  MapFileOutputFormat
  
  It is also a direct subclass of FileOutputFormat and it is used to write output as Map files.
  
  MultipleOutputs
  
  The MultipleOutputs class is used to write output data to multiple outputs. Below are the two main use cases of MultipleOutputs.
  
  Job output can be written to additional outputs other than the default output. Each additional output, or named output, may be configured with its own OutputFormat, with its own key class and value class.
  Write data to different files provided by user
  MultipleOutputs supports counters to count the number records written to each output name. But these are disabled by default.
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

What are the most common OutputFormat in Hadoop?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses