Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › How to compress the output of Map?
- This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 5:01 pm #6046DataFlair TeamSpectator
In MapReduce job how to compress intermediate output ie output of mapper? If we compress the intermediate output the volume of data need to travel from Mapper node to reduce node will decrease.
-
September 20, 2018 at 5:01 pm #6047DataFlair TeamSpectator
In Hadoop, Mapper takes a input record generated by the RecordReader and processes them and generates key-value pairs. This key-value pair is completely different from the input pair. Mapper output is known as intermediate output which is written on the local disk.
To compress output of Map set:
conf.set("mapreduce.map.output.compress", true)
We can also consider some other factors to compress mapper output like which codec to use and what should be the compression type.
Configure following properties:
mapred.map.output.compression.codec
mapred.output.compression.type
Among these two factors, choice of right codec is more important, because each codec has some pros and cons; we need to figure out, which fulfills our requirements. Normally we need faster read/write and good compression factor and CPU friendly decompression. So on the basis of these factors, the snappy codec is the best fit, as it has the faster read/write and compression factor 3.
-
September 20, 2018 at 5:01 pm #6049DataFlair TeamSpectator
In MapReduce where Mapper provides data locality but Reducer not provides the same. So we can decrease the volume of data need to travel from mapper node to reducer node by using compression on intermediate data.
To Enable compression on intermediate output we need to set
conf.set(“mapreduce.map.output.compress”, true)
After setting this property we also need to provide which codec to use and what should be the compression type,so as per our requirements we can provide codec and compression type.To provide codec and compression type below are the properties needs to set
mapred.map.output.compression.codec
mapred.output.compression.type
-
-
AuthorPosts
- You must be logged in to reply to this topic.