Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › What is RecordWriter in Hadoop MapReduce?
- This topic has 2 replies, 1 voice, and was last updated 5 years, 6 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 5:29 pm #6246DataFlair TeamSpectator
What is the purpose of RecordWriter in Hadoop?
What is the need of RecordWriter in MapReduce? -
September 20, 2018 at 5:29 pm #6249DataFlair TeamSpectator
In Hadoop, the OutputFormat determines where and how the results of map reduce job are written.
OutputFormats specify how to serialize data by providing a implementation of RecordWriter. RecordWriter classes handle the job of taking an individual key-value pair and writing it to the location prepared by the OutputFormat. There are two main functions to a RecordWriter implements: ‘write’ and ‘close’. The ‘write’ function takes key-values from the MapReduce job and writes the bytes to disk. The default RecordWriter is LineRecordWriter. It writes:The key’s bytes (returned by the getBytes() function)
a tab character delimiter
the value’s bytes (again, produced by getBytes())
a newline character.
The ‘close’ function closes the Hadoop data stream to the output file.If it was required to not to write output in output files in this default way,for ex – it might be required to write in comma separated manner, then you have to write your own Record Writer that implements the default RecordWriter. You can override write and close methods in your custom record writer to get the required output.
This record writer can then be used by your custom output format by overriding getRecordWriter() method of default output format.
Follow the link for more detail: RecordWriter in MapReduce
-
September 20, 2018 at 5:29 pm #6250DataFlair TeamSpectator
RecordWriter is a class, whose implementation is provided by the OutputFormat, collects the output key-value pairs from the Reducer and writes it into the output file.
The way these output key-value pairs are written in output files by RecordWriter is determined by the OutputFormat.Hadoop provides output formats that corresponding to each input format. All Hadoop output formats must implement the interface org.apache.hadoop.mapreduce.OutputFormat.
OutputFormat describes the output-specification for a Map-Reduce job. Based on Output specification,
- Mapreduce job checks that the output directory doesn’t already exist.
- OutputFormat provides the RecordWriter implementation to be used to write out the output files of the job.
The two main functions of RecordWriter class are: write and close
The write function writes each byte of data to the disk by taking the key-value pair output from the MapReduce job.
The close function is used to stop the write operation to the output file by Hadoop.
We can write our own custom ReadWriter class by overriding the write and close methods.
Follow the link for more detail: RecordWriter in MapReduce
-
-
AuthorPosts
- You must be logged in to reply to this topic.