Live instructor-led & Self-paced Online Certification Training Courses (Big Data, Hadoop, Spark) Forums Hadoop How to specify more than one directory as input in the MapReduce Program?

This topic contains 1 reply, has 1 voice, and was last updated by  dfbdteam3 1 year, 6 months ago.

Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
  • #4694


    There are data files spread across different folders. How to specify more than one directory in the MapReduce Job?



    To take more than one folder as input you can simply mention separate paths while running the job. Say for example you have two files:

    • /user/hduser/input1/a.txt
    • /user/hduser/input2/b.txt.

    1) Then while running the job you can write:
    $bin/hadoop jar /home/hadoop/t.jar WordCountDriver /user/hduser/input1/a.txt /user/hduser/input2/b.txt /user/hduser/output

    2) In order to make your program take input from different files you have to configure the inline arguments accordingly. To run this job you have to write the following lines:

    Configuration conf=new Configuration();
    String[] oth = new GenericOptionsParser(conf,args).getRemainingArgs();
    Job job=new Job(conf,”Example&#8221:);
    FileInputFormat.addInputPath(job,new Path(oth[0]));
    FileInputFormat.addInputPath(job,new Path(oth[1]));
    FileOutputFormat.setOutputPath(job,new Path(oth[2]));

    In this way, you can configure the job for as many numbers of separate folders as you want.

    For more details, please follow: MapReduce Tutorial

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.