Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › What is TextInputFormat in Hadoop?
- This topic has 4 replies, 1 voice, and was last updated 5 years, 6 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 3:41 pm #5577DataFlair TeamSpectator
What is TextInputFormat? For what it is used in MapReduce?
What is TextInputFormat in Hadoop MapReduce? -
September 20, 2018 at 3:41 pm #5578DataFlair TeamSpectator
TextInputFormat is one of the file formats of Hadoop.
It is a default type format of hadoop MapReduce that is if we do not specify any file formats then RecordReader will consider the input file format as textinputformat.
The key-value pairs for the textinputformat file is byteoffset as key and entire line(input)as value.For Eg:-
Suppose input file consists of line:-
Hi I am student of dataflair
I learn Hadoop at dataflairthen this input is processed by recordreader would be:-
key(byteoffset) value
0 Hi I am student of dataflair
30 I learn Hadoop at dataflair -
September 20, 2018 at 3:42 pm #5579DataFlair TeamSpectator
TextInputFormat is one of the file formats of Hadoop. As the name suggest,it is used to read lines of text files.
Basically it helps in generating key-value pairs from the text. Firstly text files are broken into lines with the help of line feed(moving one line forward) or carraige return(moving cursor to the begging of the line) to check end of line, this is called as splits.
After splits are created, key-value pairs are generated with the help of TextInputFormat. In MapReduce data elements are always structured as Key-Value pair.
So, TextInputFormat helps to generate key and value pair,key- It is the position in the file
Value- complete actual line of textfor eg :Text file: humpty dumpty set on wall
humpty dumpty had a great fallkey-value pair will be like
[KEY=0].[VALUE=humpty dumpty set on wall]
[KEY=26].[VALUE=humpty dumpty had a great fall]
After generation of key-value pair it is passed to Map function to produce intermediate output. then this intermediate output is passed to Reduce function to produce final output.
-
September 20, 2018 at 3:42 pm #5581DataFlair TeamSpectator
Text Input format is one of the file formats of Hadoop.
If we won’t define any Input format record reader will take the default file format as text input format.
the <k,v> of text input format is
k— will the byte offset ( Longwritable type) and
v– will be the entire ling ( text type )
We can define the type of input file by using conf.setInputFormat(); -
September 20, 2018 at 3:42 pm #5582DataFlair TeamSpectator
Above said answers are correct just here I would like to add some of the points.
TextInputFormat is one of a type of InputFormat in MapReduce and its default one as well. If we don’t configure any InputFormat then TextInputFormat will be considered by default. we need to go with this InputFormat when we have unformatted data and line-based data.
-
-
AuthorPosts
- You must be logged in to reply to this topic.