Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › Performance of MapReduce vs Map only job in Hadoop
- This topic has 3 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 5:05 pm #6081DataFlair TeamSpectator
MapReduce job vs Map only job performance, which is better and why?
-
September 20, 2018 at 5:05 pm #6083DataFlair TeamSpectator
Map-only job have better performance since Reducer job is not present in it which is a very time taking process.
Reducer job will have computational tasks which makes it complex and time taking.
But in many cases map only job will not be sufficient to meet the requirements of the client,in these cases we should use MapReduce jobFollow the link to learn more about Map-only job in Hadoop
-
September 20, 2018 at 5:05 pm #6085DataFlair TeamSpectator
Out of the two Map-only job has a better performance because in Map-only job no Reducer would be there and there no Sorting and Shuffling would be required to be performed on the Mapper’s output thereby saving time and disk storage.
Also as there are no reducers so the output of mappers would be directly written on the HDFS unlike MapReduce job where the intermediate output of mappers is stored on the local disk.Please note that here if there is a need to perform and aggregation or grouping then reducer job will be required (Depending upon the requirements)
For more detail follow: Map-only job in Hadoop
-
September 20, 2018 at 5:05 pm #6087DataFlair TeamSpectator
In MapReduce job both map and reducer should work out to give final output but in map-only job only mapper will work and there is no need of reducer so here we are saving time in map-only job which will improve the performance. this is the reason why map-only job is better than mapreducer and again which one to do(mapreducer or map only ) will depend on problem statement
-
-
AuthorPosts
- You must be logged in to reply to this topic.