why output file name in MapReduce is part-r-00000

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop why output file name in MapReduce is part-r-00000

Viewing 2 reply threads
  • Author
    Posts
    • #5540
      DataFlair TeamDataFlair Team
      Spectator

      is there any reason behind output file name. I always see the name like part-r-00000 for MapReduce job and part-m-00000 for Map-only Job.

    • #5542
      DataFlair TeamDataFlair Team
      Spectator

      In MapReduce, by default, the output files are named part-x-yyyyy where:

      x is ’m’or ‘r’, depend on whether the job was a map only job, or reduce job
      yyyyy is the Mapper, or Reducer, task number (zero based)
      So a job which has 32 reducers will have files named part-r-00000 to part-r-00031, one for each reducer task.
      You can change the Name of output file by giving your required name also. This is Hadoop structure to give output file name as part*.

    • #5544
      DataFlair TeamDataFlair Team
      Spectator

      The output files are by default named part-x-yyyyy
      where:
      1) x is either ‘m’ or ‘r’, depending on whether the job was a map only job, or reduce
      2) yyyyy is the Mapper, or Reducer task number (zero based)

      So if a job which has 10 reducers, files generated will have named part-r-00000 to part-r-00009, one for each reducer task.

      It is possible to change the default name.
      This is all you need to do in the Driver class to change the default of the output file:
      job.getConfiguration().set(“mapreduce.output.basename”, “flair”);

      So this will result in your files being called “flair-r-00000”.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.