Is it possible to provide multiple inputs to Hadoop? If yes then how?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 5:16 pm #6173
  
  DataFlair Team
  Spectator
  
  Is it possible to provide multiple inputs to Apache Hadoop? how?
- September 20, 2018 at 5:16 pm #6176
  
  DataFlair Team
  Spectator
  
  Yes, It is possible to use multiple inputs in hadoop. There are various ways in which this can be done –
  
  1. If Multiple input files are present in the same directory – By default hadoop doesnt read the directory recursively. But suppose if multiple input files like data1, data2,etc are present in /folder1, then Set mapreduce.input.fileinputformat.input.dir.recursive to true and then use FileInputFormat.addInputPath to specify the input directory.This can also be done in driver class code by adding FileInputFormat.setInputDirRecursive(job, true); Before FileInputFormat.addInputPath(job, new Path(args[0])); in your Map Reduce Code.
  
  2. Use FileInputFormat.addInputPaths() method, that can take a comma separated list of multiple inputs ex –
  FileInputFormat.addInputPaths(“user1/file0.gz,user2.file.gz…………”).
  
  3. Use Multiple mappers.
  
  4. Use input file in Distributed Cache.
  
  5. Use MultipleInputs.addInputPath() method tospecify different input files like –
  MultipleInputs.addInputPath(job, ClouderaPath, TextInputFormat.class, JoinclouderaMapper.class);
  MultipleInputs.addInputPath(job, HdpPath, TextInputFormat.class, HdpMapper.class);
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.