how to compress mapper output

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 11:44 am #4686
  
  DataFlair Team
  Spectator
  
  As we know Map Reduce jobs are limited by bandwidth available on the cluster.
  Optimization of Map Reduce can be done by compressing mapper output. As the size of data shuffled between Mapper node and reducer node will be reduced, this will improve Map-Reduce performance drastically.
  so how can we compress mapper output ??
- September 20, 2018 at 11:45 am #4687
  DataFlair Team
  Spectator
  Mapper task processes each input record (from RecordReader) and generates a key-value pair and this key-value pairs generated by mapper is completely different from the input pair. The output of Mapper is also Known as intermediate output is written to the local disk.
  To compress mapper output we should set conf.set(“mapreduce.map.output.compress”, true)
  Apart from setting this property to enable compression for mapper output, we also need to consider some other factors like, which codec to use and what should be the compression type.
  Following are the properties for configuring the same:-
  - mapred.map.output.compression.codec
  - mapred.output.compression.type
  Out of these two factors, the choice of right codec is of more importance. As each codec has some pros and cons, you need to figure out, which suits your requirement. Generally, you would want faster read/write, a good compression factor and CPU friendly decompression (we have less number of reducers). Considering these factors, snappy codec feels like the best fit, as it has the faster read/write and a compression factor of 3.
  
  For more details, please follow: Hadoop Mapper
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

how to compress mapper output

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses