Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › why output file name in MapReduce is part-r-00000
- This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 3:34 pm #5540DataFlair TeamSpectator
is there any reason behind output file name. I always see the name like part-r-00000 for MapReduce job and part-m-00000 for Map-only Job.
-
September 20, 2018 at 3:35 pm #5542DataFlair TeamSpectator
In MapReduce, by default, the output files are named part-x-yyyyy where:
x is ’m’or ‘r’, depend on whether the job was a map only job, or reduce job
yyyyy is the Mapper, or Reducer, task number (zero based)
So a job which has 32 reducers will have files named part-r-00000 to part-r-00031, one for each reducer task.
You can change the Name of output file by giving your required name also. This is Hadoop structure to give output file name as part*. -
September 20, 2018 at 3:35 pm #5544DataFlair TeamSpectator
The output files are by default named part-x-yyyyy
where:
1) x is either ‘m’ or ‘r’, depending on whether the job was a map only job, or reduce
2) yyyyy is the Mapper, or Reducer task number (zero based)So if a job which has 10 reducers, files generated will have named part-r-00000 to part-r-00009, one for each reducer task.
It is possible to change the default name.
This is all you need to do in the Driver class to change the default of the output file:
job.getConfiguration().set(“mapreduce.output.basename”, “flair”);So this will result in your files being called “flair-r-00000”.
-
-
AuthorPosts
- You must be logged in to reply to this topic.