How to compress the output of Map?

Viewing 2 reply threads
  • Author
    Posts
    • #6046
      DataFlair TeamDataFlair Team
      Spectator

      In MapReduce job how to compress intermediate output ie output of mapper? If we compress the intermediate output the volume of data need to travel from Mapper node to reduce node will decrease.

    • #6047
      DataFlair TeamDataFlair Team
      Spectator

      In Hadoop, Mapper takes a input record generated by the RecordReader and processes them and generates key-value pairs. This key-value pair is completely different from the input pair. Mapper output is known as intermediate output which is written on the local disk.

      To compress output of Map set:
      conf.set("mapreduce.map.output.compress", true)

      We can also consider some other factors to compress mapper output like which codec to use and what should be the compression type.
      Configure following properties:
      mapred.map.output.compression.codec

      mapred.output.compression.type

      Among these two factors, choice of right codec is more important, because each codec has some pros and cons; we need to figure out, which fulfills our requirements. Normally we need faster read/write and good compression factor and CPU friendly decompression. So on the basis of these factors, the snappy codec is the best fit, as it has the faster read/write and compression factor 3.

    • #6049
      DataFlair TeamDataFlair Team
      Spectator

      In MapReduce where Mapper provides data locality but Reducer not provides the same. So we can decrease the volume of data need to travel from mapper node to reducer node by using compression on intermediate data.

      To Enable compression on intermediate output we need to set
      conf.set(“mapreduce.map.output.compress”, true)
      After setting this property we also need to provide which codec to use and what should be the compression type,so as per our requirements we can provide codec and compression type.

      To provide codec and compression type below are the properties needs to set

      mapred.map.output.compression.codec
      mapred.output.compression.type

Viewing 2 reply threads
  • You must be logged in to reply to this topic.