Output of Mapper or Partitioner written on local disk?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop Output of Mapper or Partitioner written on local disk?

Viewing 4 reply threads
  • Author
    Posts
    • #5772
      DataFlair TeamDataFlair Team
      Spectator

      As we know intermediate output is written on local disk (on local fs). Whether the output of mapper or output of partitioner written on local disk?

    • #5773
      DataFlair TeamDataFlair Team
      Spectator

      In Hadoop, MapReduce takes input record (from RecordReader). Then, generate key-value pair which is completely different from the input pair. Mapper output is not simply written on the local disk. Before writing output of mapper to local disk partitioning of output takes place on the basis of key and sorted.

      In Hadoop, partitioning of the keys of the intermediate map output is controlled by PartitionerHash function, is used to drive partition. On the basis of the key-value pair, each map output is partitioned. The record having same key value goes into the same partition (within each mapper. Then the output of partitioner is written on the local disk.

    • #5774
      DataFlair TeamDataFlair Team
      Spectator

      In In Hadoop output of the Mapper is stored on the local disk and before sending this output to the reducer, the partitioner uses intermediate output of the mapper ( key-value pair ) and according to key, value pair each mapper output is partitioned and all the records having the same key value goes into same partition.

      By default partition is performed using the hash function and each partition is sent to the reducer by determining which reducer is responsible for the particular key.

      The total number of partitioner runs in Hadoop is same as number of reducers

    • #5776
      DataFlair TeamDataFlair Team
      Spectator

      In Hadoop,the output of Mapper is stored on local disk,as it is intermediate output.
      There is no need to store intermediate data on HDFS because :

      • data write is costly and involves replication which further increases cost head and time.
      • intermediate data is required only unless it is sent to the reducer for further processing to get the final output,so not needed to store permanently,thus stored on local disk only.

      Now the question is writing to local disk occurs after Mapper stage,the data is stored in the form of key-value pair on local disk.
      The partitioner works on the data stored on local disk and segregates data in the form of 1 particular key and all related values using hash function.This operation is to ensure that all values related to a key are stored in one partitioner and send to same reducer.
      No of paritioner is equal to no of reducers,hence data on 1 partitioner is sent to 1 reducer.
      Also,all mappers sent values related to a particular key to same partitioner,which is further sorted and sent to a reducer.
      And partitioners are involved only when we have multiple reducers.

    • #5778
      DataFlair TeamDataFlair Team
      Spectator

      An output of the mapper is stored on the local disk, partitioner then takes the output of the mapper (k-v pair) and then segregates the data based on the hash value of the key, All records having the same key will be stored in the same partition.

      These partitions are then sent to the reducer hence we have the number of partitions same as the number of reducers as one partition will have the record set of one key.

Viewing 4 reply threads
  • You must be logged in to reply to this topic.