What are the most common InputFormats in Hadoop?

This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 2:58 pm #5368
  
  DataFlair Team
  Spectator
  
  What is InputFormat in Hadoop MapReduce?
  How many types of InputFormat is there in Hadoop?
  What are the different types of input format in MapReduce?
- September 20, 2018 at 2:59 pm #5370
  DataFlair Team
  Spectator
  In Hadoop, Input files stores the data for a MapReducejob. Input files which stores data typically reside in HDFS. Thus, in MapReduce, InputFormat defines how these input files split and read. InputFormat creates Inputsplit.
  
  Most common InputFormat are:
  
  FileInputFormat- It is the base class for all file-based InputFormat. It specifies input directory where data files are present. FileInputFormat also read all files. And, then divides these files into one or more InputSplits.
  
  TextInputFormat- It is the default InputFormat of MapReduce. It uses each line of each input file as separate record. Thus, performs no parsing.
  - Key- byte offset.
  - Value- It is the contents of the line, excluding line terminators.
  Example content of file- is john may which katty
  - Key- 0
  - Value- is john may which katty
  KeyValueTextInputFormat- It is similar to TextInputFormat. Hence, it treats each line of input as a separate record. But the main difference is that TextInputFormat treats entire line as the value. While the KeyValueTextInputFormat breaks the line itself into key and value by the tab character (‘/t’).
  - Key- Everything up to tab character.
  - Value- Remaining part of the line after tab character.
  Example content of file- is -> john may which katty
  - Key- is
  - Value- john may which katty
  Tab character “->”
  
  SequenceFileInputFormat- It is the InputFormat which reads sequence files. Key & Value- Both are user-defined.
  
  Follow the link to learn more about InputFormat in Hadoop
- September 20, 2018 at 2:59 pm #5371
  
  DataFlair Team
  Spectator
  
  There are following the most common InputFormat in Hadoop in Hadoop:-
  
  1. Text Input Format (default)
  
  2. Key Value Input Format
  
  3. Sequence File Input Format
  a. As Binary Input
  b. As text Input
  
  Among them Text Input Format is the Hadoop default one.
  
  Other than these there are some more input formats based on requirements as follows:-
  1. CombineFileInputFormat
  2. CombineSequenceFileInputFormat
  3. CombineTextInputFormat
  4. CompositeInputFormat
  5. DBInputFormat
  6. FixedLengthInputFormat
  7. MultiFileInputFormat
  8. NLineInputFormat
  9. Parser.Node
  10. FileInputFormat, etc.
  
  Folloe the link to learn more about InputFormat in Hadoop
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

What are the most common InputFormats in Hadoop?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses