How to specify more than one directory as input in the MapReduce Program?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop How to specify more than one directory as input in the MapReduce Program?

Viewing 1 reply thread
  • Author
    Posts
    • #4694
      DataFlair TeamDataFlair Team
      Spectator

      There are data files spread across different folders. How to specify more than one directory in the MapReduce Job?

    • #4695
      DataFlair TeamDataFlair Team
      Spectator

      To take more than one folder as input you can simply mention separate paths while running the job. Say for example you have two files:

      • /user/hduser/input1/a.txt
      • /user/hduser/input2/b.txt.

      1) Then while running the job you can write:
      $bin/hadoop jar /home/hadoop/t.jar WordCountDriver /user/hduser/input1/a.txt /user/hduser/input2/b.txt /user/hduser/output

      2) In order to make your program take input from different files you have to configure the inline arguments accordingly. To run this job you have to write the following lines:

      Configuration conf=new Configuration();
      String[] oth = new GenericOptionsParser(conf,args).getRemainingArgs();
      Job job=new Job(conf,”Example&#8221:);
      FileInputFormat.addInputPath(job,new Path(oth[0]));
      FileInputFormat.addInputPath(job,new Path(oth[1]));
      FileOutputFormat.setOutputPath(job,new Path(oth[2]));

      In this way, you can configure the job for as many numbers of separate folders as you want.

      For more details, please follow: MapReduce Tutorial

Viewing 1 reply thread
  • You must be logged in to reply to this topic.