We can use four techniques for CODEC(Compression and Decompression) in Hadoop.
1) LZO– Very fast decompression and reasonable compression
2) GZIP– Reasonable decompression and reasonable compression
3) Snappy– Faster Compression and faster decompression formats. Less efficient in terms of compression ratio.
4) bGIP2
LZO, GZIP and Snappy compress and decompress files in normal formats while in bGIP2 compression is done splittable format i.e divide the data in no of programs.
Implement these CODEC
1) Set “mapred.output.compress” property as true.
hadoop-2.5.0-cdh5.3.2/etc/hadoop/mapred-site.xml
<property>mapred.output.compress</property>
<value>true</value>
By default, in “maped.output.compression.codec” property “org.apache.hadoop.io.compress.DefaultCodec” is set as soon as you do 1st step. You can change “DefaultCodec” value with your choice of CODEC e.g.LZOCodec.
2) We can even write our own algorithm for CODEC.