Live instructor-led & Self-paced Online Certification Training Courses (Big Data, Hadoop, Spark) Forums Hadoop How to specify more than one directory as input in the MapReduce Program?

This topic contains 1 reply, has 1 voice, and was last updated by  dfbdteam3 1 year, 6 months ago.

Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
    Posts
  • #4694

    dfbdteam3
    Moderator

    There are data files spread across different folders. How to specify more than one directory in the MapReduce Job?

    #4695

    dfbdteam3
    Moderator

    To take more than one folder as input you can simply mention separate paths while running the job. Say for example you have two files:

    • /user/hduser/input1/a.txt
    • /user/hduser/input2/b.txt.

    1) Then while running the job you can write:
    $bin/hadoop jar /home/hadoop/t.jar WordCountDriver /user/hduser/input1/a.txt /user/hduser/input2/b.txt /user/hduser/output

    2) In order to make your program take input from different files you have to configure the inline arguments accordingly. To run this job you have to write the following lines:

    Configuration conf=new Configuration();
    String[] oth = new GenericOptionsParser(conf,args).getRemainingArgs();
    Job job=new Job(conf,”Example&#8221:);
    FileInputFormat.addInputPath(job,new Path(oth[0]));
    FileInputFormat.addInputPath(job,new Path(oth[1]));
    FileOutputFormat.setOutputPath(job,new Path(oth[2]));

    In this way, you can configure the job for as many numbers of separate folders as you want.

    For more details, please follow: MapReduce Tutorial

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.