What is TextInputFormat in Hadoop?

Viewing 4 reply threads
  • Author
    Posts
    • #5577
      DataFlair TeamDataFlair Team
      Spectator

      What is TextInputFormat? For what it is used in MapReduce?
      What is TextInputFormat in Hadoop MapReduce?

    • #5578
      DataFlair TeamDataFlair Team
      Spectator

      TextInputFormat is one of the file formats of Hadoop.
      It is a default type format of hadoop MapReduce that is if we do not specify any file formats then RecordReader will consider the input file format as textinputformat.
      The key-value pairs for the textinputformat file is byteoffset as key and entire line(input)as value.

      For Eg:-

      Suppose input file consists of line:-

      Hi I am student of dataflair
      I learn Hadoop at dataflair

      then this input is processed by recordreader would be:-

      key(byteoffset) value
      0 Hi I am student of dataflair
      30 I learn Hadoop at dataflair

    • #5579
      DataFlair TeamDataFlair Team
      Spectator

      TextInputFormat is one of the file formats of Hadoop. As the name suggest,it is used to read lines of text files.
      Basically it helps in generating key-value pairs from the text. Firstly text files are broken into lines with the help of line feed(moving one line forward) or carraige return(moving cursor to the begging of the line) to check end of line, this is called as splits.
      After splits are created, key-value pairs are generated with the help of TextInputFormat. In MapReduce data elements are always structured as Key-Value pair.
      So, TextInputFormat helps to generate key and value pair,

      key- It is the position in the file
      Value- complete actual line of text

      for eg :Text file: humpty dumpty set on wall
      humpty dumpty had a great fall

      key-value pair will be like

      [KEY=0].[VALUE=humpty dumpty set on wall]

      [KEY=26].[VALUE=humpty dumpty had a great fall]

      After generation of key-value pair it is passed to Map function to produce intermediate output. then this intermediate output is passed to Reduce function to produce final output.

    • #5581
      DataFlair TeamDataFlair Team
      Spectator

      Text Input format is one of the file formats of Hadoop.
      If we won’t define any Input format record reader will take the default file format as text input format.
      the <k,v> of text input format is
      k— will the byte offset ( Longwritable type) and
      v– will be the entire ling ( text type )
      We can define the type of input file by using conf.setInputFormat();

    • #5582
      DataFlair TeamDataFlair Team
      Spectator

      Above said answers are correct just here I would like to add some of the points.

      TextInputFormat is one of a type of InputFormat in MapReduce and its default one as well. If we don’t configure any InputFormat then TextInputFormat will be considered by default. we need to go with this InputFormat when we have unformatted data and line-based data.

Viewing 4 reply threads
  • You must be logged in to reply to this topic.