Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › Is it possible to provide multiple inputs to Hadoop? If yes then how?
- This topic has 1 reply, 1 voice, and was last updated 5 years, 6 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 5:16 pm #6173DataFlair TeamSpectator
Is it possible to provide multiple inputs to Apache Hadoop? how?
-
September 20, 2018 at 5:16 pm #6176DataFlair TeamSpectator
Yes, It is possible to use multiple inputs in hadoop. There are various ways in which this can be done –
1. If Multiple input files are present in the same directory – By default hadoop doesnt read the directory recursively. But suppose if multiple input files like data1, data2,etc are present in /folder1, then Set mapreduce.input.fileinputformat.input.dir.recursive to true and then use FileInputFormat.addInputPath to specify the input directory.This can also be done in driver class code by adding FileInputFormat.setInputDirRecursive(job, true); Before FileInputFormat.addInputPath(job, new Path(args[0])); in your Map Reduce Code.
2. Use FileInputFormat.addInputPaths() method, that can take a comma separated list of multiple inputs ex –
FileInputFormat.addInputPaths(“user1/file0.gz,user2.file.gz…………”).3. Use Multiple mappers.
4. Use input file in Distributed Cache.
5. Use MultipleInputs.addInputPath() method tospecify different input files like –
MultipleInputs.addInputPath(job, ClouderaPath, TextInputFormat.class, JoinclouderaMapper.class);
MultipleInputs.addInputPath(job, HdpPath, TextInputFormat.class, HdpMapper.class);
-
-
AuthorPosts
- You must be logged in to reply to this topic.