What is TextInputFormat in Hadoop?

This topic has 4 replies, 1 voice, and was last updated 5 years, 6 months ago by DataFlair Team.

Viewing 4 reply threads

Author

Posts
- September 20, 2018 at 3:41 pm #5577
  
  DataFlair Team
  Spectator
  
  What is TextInputFormat? For what it is used in MapReduce?
  What is TextInputFormat in Hadoop MapReduce?
- September 20, 2018 at 3:41 pm #5578
  
  DataFlair Team
  Spectator
  
  TextInputFormat is one of the file formats of Hadoop.
  It is a default type format of hadoop MapReduce that is if we do not specify any file formats then RecordReader will consider the input file format as textinputformat.
  The key-value pairs for the textinputformat file is byteoffset as key and entire line(input)as value.
  
  For Eg:-
  
  Suppose input file consists of line:-
  
  Hi I am student of dataflair
  I learn Hadoop at dataflair
  
  then this input is processed by recordreader would be:-
  
  key(byteoffset) value
  0 Hi I am student of dataflair
  30 I learn Hadoop at dataflair
- September 20, 2018 at 3:42 pm #5579
  
  DataFlair Team
  Spectator
  
  TextInputFormat is one of the file formats of Hadoop. As the name suggest,it is used to read lines of text files.
  Basically it helps in generating key-value pairs from the text. Firstly text files are broken into lines with the help of line feed(moving one line forward) or carraige return(moving cursor to the begging of the line) to check end of line, this is called as splits.
  After splits are created, key-value pairs are generated with the help of TextInputFormat. In MapReduce data elements are always structured as Key-Value pair.
  So, TextInputFormat helps to generate key and value pair,
  
  key- It is the position in the file
  Value- complete actual line of text
  
  for eg :Text file: humpty dumpty set on wall
  humpty dumpty had a great fall
  
  key-value pair will be like
  
  [KEY=0].[VALUE=humpty dumpty set on wall]
  
  [KEY=26].[VALUE=humpty dumpty had a great fall]
  
  After generation of key-value pair it is passed to Map function to produce intermediate output. then this intermediate output is passed to Reduce function to produce final output.
- September 20, 2018 at 3:42 pm #5581
  
  DataFlair Team
  Spectator
  
  Text Input format is one of the file formats of Hadoop.
  If we won’t define any Input format record reader will take the default file format as text input format.
  the <k,v> of text input format is
  k— will the byte offset ( Longwritable type) and
  v– will be the entire ling ( text type )
  We can define the type of input file by using conf.setInputFormat();
- September 20, 2018 at 3:42 pm #5582
  
  DataFlair Team
  Spectator
  
  Above said answers are correct just here I would like to add some of the points.
  
  TextInputFormat is one of a type of InputFormat in MapReduce and its default one as well. If we don’t configure any InputFormat then TextInputFormat will be considered by default. we need to go with this InputFormat when we have unformatted data and line-based data.
Author

Posts

Viewing 4 reply threads

You must be logged in to reply to this topic.

What is TextInputFormat in Hadoop?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses