What is RecordWriter in Hadoop MapReduce?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop What is RecordWriter in Hadoop MapReduce?

Viewing 2 reply threads
  • Author
    Posts
    • #6246
      DataFlair TeamDataFlair Team
      Spectator

      What is the purpose of RecordWriter in Hadoop?
      What is the need of RecordWriter in MapReduce?

    • #6249
      DataFlair TeamDataFlair Team
      Spectator

      In Hadoop, the OutputFormat determines where and how the results of map reduce job are written.
      OutputFormats specify how to serialize data by providing a implementation of RecordWriter. RecordWriter classes handle the job of taking an individual key-value pair and writing it to the location prepared by the OutputFormat. There are two main functions to a RecordWriter implements: ‘write’ and ‘close’. The ‘write’ function takes key-values from the MapReduce job and writes the bytes to disk. The default RecordWriter is LineRecordWriter. It writes:

      The key’s bytes (returned by the getBytes() function)
      a tab character delimiter
      the value’s bytes (again, produced by getBytes())
      a newline character.
      The ‘close’ function closes the Hadoop data stream to the output file.

      If it was required to not to write output in output files in this default way,for ex – it might be required to write in comma separated manner, then you have to write your own Record Writer that implements the default RecordWriter. You can override write and close methods in your custom record writer to get the required output.

      This record writer can then be used by your custom output format by overriding getRecordWriter() method of default output format.

      Follow the link for more detail: RecordWriter in MapReduce

    • #6250
      DataFlair TeamDataFlair Team
      Spectator

      RecordWriter is a class, whose implementation is provided by the OutputFormat, collects the output key-value pairs from the Reducer and writes it into the output file.
      The way these output key-value pairs are written in output files by RecordWriter is determined by the OutputFormat.

      Hadoop provides output formats that corresponding to each input format. All Hadoop output formats must implement the interface org.apache.hadoop.mapreduce.OutputFormat.

      OutputFormat describes the output-specification for a Map-Reduce job. Based on Output specification,

      • Mapreduce job checks that the output directory doesn’t already exist.
      • OutputFormat provides the RecordWriter implementation to be used to write out the output files of the job.

      The two main functions of RecordWriter class are: write and close

      The write function writes each byte of data to the disk by taking the key-value pair output from the MapReduce job.

      The close function is used to stop the write operation to the output file by Hadoop.

      We can write our own custom ReadWriter class by overriding the write and close methods.

      Follow the link for more detail: RecordWriter in MapReduce

Viewing 2 reply threads
  • You must be logged in to reply to this topic.