Is it possible to provide multiple inputs to Hadoop? If yes then how?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop Is it possible to provide multiple inputs to Hadoop? If yes then how?

Viewing 1 reply thread
  • Author
    Posts
    • #6173
      DataFlair TeamDataFlair Team
      Spectator

      Is it possible to provide multiple inputs to Apache Hadoop? how?

    • #6176
      DataFlair TeamDataFlair Team
      Spectator

      Yes, It is possible to use multiple inputs in hadoop. There are various ways in which this can be done –

      1. If Multiple input files are present in the same directory – By default hadoop doesnt read the directory recursively. But suppose if multiple input files like data1, data2,etc are present in /folder1, then Set mapreduce.input.fileinputformat.input.dir.recursive to true and then use FileInputFormat.addInputPath to specify the input directory.This can also be done in driver class code by adding FileInputFormat.setInputDirRecursive(job, true); Before FileInputFormat.addInputPath(job, new Path(args[0])); in your Map Reduce Code.

      2. Use FileInputFormat.addInputPaths() method, that can take a comma separated list of multiple inputs ex –
      FileInputFormat.addInputPaths(“user1/file0.gz,user2.file.gz…………”).

      3. Use Multiple mappers.

      4. Use input file in Distributed Cache.

      5. Use MultipleInputs.addInputPath() method tospecify different input files like –
      MultipleInputs.addInputPath(job, ClouderaPath, TextInputFormat.class, JoinclouderaMapper.class);
      MultipleInputs.addInputPath(job, HdpPath, TextInputFormat.class, HdpMapper.class);

Viewing 1 reply thread
  • You must be logged in to reply to this topic.