How to optimize the MapReduce job?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 5:15 pm #6170
  
  DataFlair Team
  Spectator
  
  How to optimize the MapReduce job?
- September 20, 2018 at 5:15 pm #6172
  
  DataFlair Team
  Spectator
  
  There are many ways to optimize MapReduce Job. Below are the few:
  1. Compress intermediate output from Mapper. Hence, data written to the disk and transfer over network during shuffle to reducer decreases.
  It can done by setting, mapreduce.map.output.compress
  2. Tune the no of Map/Reduce tasks. If there are 10-20 no of mappers, which lasts for only a few seconds, better to use only one mapper, as for every Map task, JVM initializes, starts, stops., which is very costly in terms of memory/CPU.
  3. If the data is huge(in TBs), better to increase the HDFS block size to 256MB or even to 512MB, thus reducing no of MapReduce tasks.
  4. Use Combiner in between Mapper and Reducer, so that the amount data which needs to be shuffled from mapper to reducer decreases.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.