This topic contains 2 replies, has 1 voice, and was last updated by  dfbdteam3 1 year, 8 months ago.

Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
    Posts
  • #5540

    dfbdteam3
    Moderator

    is there any reason behind output file name. I always see the name like part-r-00000 for MapReduce job and part-m-00000 for Map-only Job.

    #5542

    dfbdteam3
    Moderator

    In MapReduce, by default, the output files are named part-x-yyyyy where:

    x is ’m’or ‘r’, depend on whether the job was a map only job, or reduce job
    yyyyy is the Mapper, or Reducer, task number (zero based)
    So a job which has 32 reducers will have files named part-r-00000 to part-r-00031, one for each reducer task.
    You can change the Name of output file by giving your required name also. This is Hadoop structure to give output file name as part*.

    #5544

    dfbdteam3
    Moderator

    The output files are by default named part-x-yyyyy
    where:
    1) x is either ‘m’ or ‘r’, depending on whether the job was a map only job, or reduce
    2) yyyyy is the Mapper, or Reducer task number (zero based)

    So if a job which has 10 reducers, files generated will have named part-r-00000 to part-r-00009, one for each reducer task.

    It is possible to change the default name.
    This is all you need to do in the Driver class to change the default of the output file:
    job.getConfiguration().set(“mapreduce.output.basename”, “flair”);

    So this will result in your files being called “flair-r-00000”.

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic.