Live instructor-led & Self-paced Online Certification Training Courses (Big Data, Hadoop, Spark) › Forums › Hadoop › Output of Mapper or Partitioner written on local disk?
September 20, 2018 at 4:10 pm #5772
As we know intermediate output is written on local disk (on local fs). Whether the output of mapper or output of partitioner written on local disk?September 20, 2018 at 4:11 pm #5773
In Hadoop, MapReduce takes input record (from RecordReader). Then, generate key-value pair which is completely different from the input pair. Mapper output is not simply written on the local disk. Before writing output of mapper to local disk partitioning of output takes place on the basis of key and sorted.
In Hadoop, partitioning of the keys of the intermediate map output is controlled by Partitioner. Hash function, is used to drive partition. On the basis of the key-value pair, each map output is partitioned. The record having same key value goes into the same partition (within each mapper. Then the output of partitioner is written on the local disk.September 20, 2018 at 4:11 pm #5774
In In Hadoop output of the Mapper is stored on the local disk and before sending this output to the reducer, the partitioner uses intermediate output of the mapper ( key-value pair ) and according to key, value pair each mapper output is partitioned and all the records having the same key value goes into same partition.
By default partition is performed using the hash function and each partition is sent to the reducer by determining which reducer is responsible for the particular key.
The total number of partitioner runs in Hadoop is same as number of reducersSeptember 20, 2018 at 4:11 pm #5776
- data write is costly and involves replication which further increases cost head and time.
- intermediate data is required only unless it is sent to the reducer for further processing to get the final output,so not needed to store permanently,thus stored on local disk only.
Now the question is writing to local disk occurs after Mapper stage,the data is stored in the form of key-value pair on local disk.
The partitioner works on the data stored on local disk and segregates data in the form of 1 particular key and all related values using hash function.This operation is to ensure that all values related to a key are stored in one partitioner and send to same reducer.
No of paritioner is equal to no of reducers,hence data on 1 partitioner is sent to 1 reducer.
Also,all mappers sent values related to a particular key to same partitioner,which is further sorted and sent to a reducer.
And partitioners are involved only when we have multiple reducers.September 20, 2018 at 4:11 pm #5778
An output of the mapper is stored on the local disk, partitioner then takes the output of the mapper (k-v pair) and then segregates the data based on the hash value of the key, All records having the same key will be stored in the same partition.
These partitions are then sent to the reducer hence we have the number of partitions same as the number of reducers as one partition will have the record set of one key.
You must be logged in to reply to this topic.