What is RecordWriter in Hadoop MapReduce?

This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 5:29 pm #6246
  
  DataFlair Team
  Spectator
  
  What is the purpose of RecordWriter in Hadoop?
  What is the need of RecordWriter in MapReduce?
- September 20, 2018 at 5:29 pm #6249
  
  DataFlair Team
  Spectator
  
  In Hadoop, the OutputFormat determines where and how the results of map reduce job are written.
  OutputFormats specify how to serialize data by providing a implementation of RecordWriter. RecordWriter classes handle the job of taking an individual key-value pair and writing it to the location prepared by the OutputFormat. There are two main functions to a RecordWriter implements: ‘write’ and ‘close’. The ‘write’ function takes key-values from the MapReduce job and writes the bytes to disk. The default RecordWriter is LineRecordWriter. It writes:
  
  The key’s bytes (returned by the getBytes() function)
  a tab character delimiter
  the value’s bytes (again, produced by getBytes())
  a newline character.
  The ‘close’ function closes the Hadoop data stream to the output file.
  
  If it was required to not to write output in output files in this default way,for ex – it might be required to write in comma separated manner, then you have to write your own Record Writer that implements the default RecordWriter. You can override write and close methods in your custom record writer to get the required output.
  
  This record writer can then be used by your custom output format by overriding getRecordWriter() method of default output format.
  
  Follow the link for more detail: RecordWriter in MapReduce
- September 20, 2018 at 5:29 pm #6250
  DataFlair Team
  Spectator
  RecordWriter is a class, whose implementation is provided by the OutputFormat, collects the output key-value pairs from the Reducer and writes it into the output file.
  The way these output key-value pairs are written in output files by RecordWriter is determined by the OutputFormat.
  
  Hadoop provides output formats that corresponding to each input format. All Hadoop output formats must implement the interface org.apache.hadoop.mapreduce.OutputFormat.
  
  OutputFormat describes the output-specification for a Map-Reduce job. Based on Output specification,
  - Mapreduce job checks that the output directory doesn’t already exist.
  - OutputFormat provides the RecordWriter implementation to be used to write out the output files of the job.
  The two main functions of RecordWriter class are: write and close
  
  The write function writes each byte of data to the disk by taking the key-value pair output from the MapReduce job.
  
  The close function is used to stop the write operation to the output file by Hadoop.
  
  We can write our own custom ReadWriter class by overriding the write and close methods.
  
  Follow the link for more detail: RecordWriter in MapReduce
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

What is RecordWriter in Hadoop MapReduce?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses