How to optimize the MapReduce job?

Viewing 1 reply thread
  • Author
    Posts
    • #6170
      DataFlair TeamDataFlair Team
      Spectator

      How to optimize the MapReduce job?

    • #6172
      DataFlair TeamDataFlair Team
      Spectator

      There are many ways to optimize MapReduce Job. Below are the few:
      1. Compress intermediate output from Mapper. Hence, data written to the disk and transfer over network during shuffle to reducer decreases.
      It can done by setting, mapreduce.map.output.compress
      2. Tune the no of Map/Reduce tasks. If there are 10-20 no of mappers, which lasts for only a few seconds, better to use only one mapper, as for every Map task, JVM initializes, starts, stops., which is very costly in terms of memory/CPU.
      3. If the data is huge(in TBs), better to increase the HDFS block size to 256MB or even to 512MB, thus reducing no of MapReduce tasks.
      4. Use Combiner in between Mapper and Reducer, so that the amount data which needs to be shuffled from mapper to reducer decreases.

Viewing 1 reply thread
  • You must be logged in to reply to this topic.