What are the different types of OutputFormat in MapReduce?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop What are the different types of OutputFormat in MapReduce?

Viewing 1 reply thread
  • Author
    Posts
    • #6204
      DataFlair TeamDataFlair Team
      Spectator

      What are the most common OutputFormat in Hadoop?
      How many types of OutputFormat is there in Hadoop?

    • #6206
      DataFlair TeamDataFlair Team
      Spectator

      Hadoop Recordwriter takes output data from Reducer and writes this data to output files.the method of these output key-value pairs are written in output files by record writer is determined by the output format.
      The OutputFormat and InputFormat are similar. OutputFormat cases provided by hadoop are used to write files on the local disk or on hadoop file system. Output format defines the output requirements of the MapReduce job. Simple requirements are,
      1. It checks that the output directory does not already exist.
      2. Outputformat provides the record writer implementation to be used to write out files of the job.
      3. Output files stored in hadoop file system.
      • Fileoutputformat.setoutputpath() to set output directory
      Types:
      • Textoutputformat
      • Sequencefileoutputformat
      • Map fileoutputformat
      • Multiplroutputs
      • Lazyoutputformat
      • Dboutputformat
      • Sequencefileasbinaryoutput format

      Textoutputformat: is Default output format.it writes key value pairs on individual lines of text files.
      Each key value pair is separated by a tab character.
      • Property :MapReduce.output,textoutputformat.separator properly
      • KeyvalueTextoutformst used for reading these output text file .meanwhile it breaks lines into key value pairs based on a configurable separator.
      SequenceFileOutputFormat is an OutputFormat which writes sequences files for its output and its intermediate format use between MapReduce jobs.
      • Which rapidly serialize random data types to the file, and the corresponding sequencefileformat will deserialize the file into same data types and presents the data to the next mapper.

      MapFileOutputFormat
      • MapFileOutputFormat is another form of FileOutputFormat, which is used to write output as map files. The key in a MapFile must be added in order, we need to confirm that reducer emits keys in sorted order.

      MultipleOutputs allows writing data to files whose names are resulting from the output keys and values,
      or in the statement from a random string.

      LazyOutputFormat Sometimes FileOutputFormat will create output files, even if they are empty.

      • LazyOutputFormat is a wrapper OutputFormat which ensures that the output file will be created only when the record is emitted for a given screen.

      DBOutputFormat is an OutputFormat for writing to relational databases and HBase. It sends the reduce output to a SQL table.
      • It accepts key-value pairs, where the key has a type extending DBwritable.
      • Returned RecordWriter writes only the key to the database with a batch SQL query.

      Follow the link for more detail: OutputFormat in Hadoop

Viewing 1 reply thread
  • You must be logged in to reply to this topic.