How to specify more than one directory as input in the MapReduce Program?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 11:47 am #4694
  
  DataFlair Team
  Spectator
  
  There are data files spread across different folders. How to specify more than one directory in the MapReduce Job?
- September 20, 2018 at 11:47 am #4695
  DataFlair Team
  Spectator
  To take more than one folder as input you can simply mention separate paths while running the job. Say for example you have two files:
  - /user/hduser/input1/a.txt
  - /user/hduser/input2/b.txt.
  1) Then while running the job you can write:
  $bin/hadoop jar /home/hadoop/t.jar WordCountDriver /user/hduser/input1/a.txt /user/hduser/input2/b.txt /user/hduser/output
  
  2) In order to make your program take input from different files you have to configure the inline arguments accordingly. To run this job you have to write the following lines:
```
Configuration conf=new Configuration();
String[] oth = new GenericOptionsParser(conf,args).getRemainingArgs();
Job job=new Job(conf,”Example&#8221:);
FileInputFormat.addInputPath(job,new Path(oth[0]));
FileInputFormat.addInputPath(job,new Path(oth[1]));
FileOutputFormat.setOutputPath(job,new Path(oth[2]));
```
  In this way, you can configure the job for as many numbers of separate folders as you want.
  
  For more details, please follow: MapReduce Tutorial
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.