How to enable/configure the compression of map output data in hadoop?

This topic has 1 reply, 1 voice, and was last updated 7 years, 8 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 2:17 pm #5110
  
  DataFlair Team
  Spectator
  
  In MapReduce output of mapper is shuffled to reducer node. shuffling is the physical movement of data, which is done over the network and is very costly. Hence MapReduce speed is dependent on network bandwidth, if we talk about the optimization of MapReduce job to improve efficiency, we can compress the intermediate output so that the time required to shuffle the data will be minimized. How to configure the compression?
- September 20, 2018 at 2:17 pm #5113
  DataFlair Team
  Spectator
  We can use four techniques for CODEC(Compression and Decompression) in Hadoop.
  
  1) LZO– Very fast decompression and reasonable compression
  
  2) GZIP– Reasonable decompression and reasonable compression
  
  3) Snappy– Faster Compression and faster decompression formats. Less efficient in terms of compression ratio.
  
  4) bGIP2
  
  LZO, GZIP and Snappy compress and decompress files in normal formats while in bGIP2 compression is done splittable format i.e divide the data in no of programs.
  
  Implement these CODEC
  1) Set “mapred.output.compress” property as true.
```
hadoop-2.5.0-cdh5.3.2/etc/hadoop/mapred-site.xml
<property>mapred.output.compress</property>
<value>true</value>
```
  By default, in “maped.output.compression.codec” property “org.apache.hadoop.io.compress.DefaultCodec” is set as soon as you do 1st step. You can change “DefaultCodec” value with your choice of CODEC e.g.LZOCodec.
  
  2) We can even write our own algorithm for CODEC.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

How to enable/configure the compression of map output data in hadoop?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses