why output file name in MapReduce is part-r-00000

This topic has 2 replies, 1 voice, and was last updated 7 years, 10 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 3:34 pm #5540
  
  DataFlair Team
  Spectator
  
  is there any reason behind output file name. I always see the name like part-r-00000 for MapReduce job and part-m-00000 for Map-only Job.
- September 20, 2018 at 3:35 pm #5542
  
  DataFlair Team
  Spectator
  
  In MapReduce, by default, the output files are named part-x-yyyyy where:
  
  x is ’m’or ‘r’, depend on whether the job was a map only job, or reduce job
  yyyyy is the Mapper, or Reducer, task number (zero based)
  So a job which has 32 reducers will have files named part-r-00000 to part-r-00031, one for each reducer task.
  You can change the Name of output file by giving your required name also. This is Hadoop structure to give output file name as part*.
- September 20, 2018 at 3:35 pm #5544
  
  DataFlair Team
  Spectator
  
  The output files are by default named part-x-yyyyy
  where:
  1) x is either ‘m’ or ‘r’, depending on whether the job was a map only job, or reduce
  2) yyyyy is the Mapper, or Reducer task number (zero based)
  
  So if a job which has 10 reducers, files generated will have named part-r-00000 to part-r-00009, one for each reducer task.
  
  It is possible to change the default name.
  This is all you need to do in the Driver class to change the default of the output file:
  job.getConfiguration().set(“mapreduce.output.basename”, “flair”);
  
  So this will result in your files being called “flair-r-00000”.
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.