What is Output Format in hadoop?

Viewing 1 reply thread
  • Author
    Posts
    • #5179
      DataFlair TeamDataFlair Team
      Spectator

      Explain OutputFormat and its working briefly.

    • #5180
      DataFlair TeamDataFlair Team
      Spectator

      In Hadoop, Reducer takes a set of an intermediate key-value pair produces by the Mapper as an input and runs a Reducer function on them to generate output (zero or more key-value pair). RecordWriter writes these output key-value pair from the Reducer phase to output files. The way these key-value pairs are written in Output files by RecordWriter is determined by the OutputFormat. OutputFormat instances provided by the Hadoop are used to write to files on the local disk or in HDFS.
      FileOutputFormat.setOutputpath() method used to set the output directory. Every Reducer writes a separate in a common output directory.
      Most common OutputFormat are: 
      1) TextOutputFormat- It is the default OutputFormat in MapReduce. It writes key-value pairs on individual lines of text files. Keys and values of TextOutputFormat can be of any type since TextOutputFormat turns them to string by calling toString() on them.
      2) SequenceFileOutputFormat- This OutputFormat writes sequences files for its output and it is intermediate format use between MapReduce jobs.
      3) SequenceFileAsTextInputFormat- It is another form of SequenceFileInputFormat which writes keys and values to sequence file in binary format.
      4) DBOutputFormat- It is an OutputFormat for writing to relational databases and Hbase. It sends the reduce output to a SQL table. It accepts key-value pairs, where the key has a type extending DBwritable.

      Follow the link to learn more about OutputFormat in Hadoop

Viewing 1 reply thread
  • You must be logged in to reply to this topic.