What is the sequence of execution of Mapper, Combiner and Partitioner ?

This topic has 4 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 4 reply threads

Author

Posts
- September 20, 2018 at 4:02 pm #5722
  
  DataFlair Team
  Spectator
  
  What is the sequence of execution of Mapper, Combiner and Partitioner ?
- September 20, 2018 at 4:02 pm #5723
  
  DataFlair Team
  Spectator
  
  The sequence of execution of the mentioned components happens in the below order:
  Mapper -> Combiner -> Partitioner
  Mapper : The Input data is initially processed by all the Mappers/Map jobs and the intermediate output is created.
  
  Combiner : All the intermediate outputs are optimized by local aggregation before the shuffle/sort phase by the Combiner. The primary goal of Combiners is to save as much bandwidth as possible by minimizing the number of key/value pairs that will be shuffled across the network and provided as input to the Reducer.
  
  Partitioner : In Hadoop, partitioning of the keys of the intermediate map output is controlled by Partitioner. Hash function, is used to derive partition. On the basis of key-value pair each map output is partitioned. Record having same key value goes into the same partition (within each mapper), and then each partition is sent to a Reducer. Partition phase takes place in between mapper and reducer.
  Default Partitioner (Hash Partitioner) computes a hash value for the key and assigns the partition based on this result
- September 20, 2018 at 4:03 pm #5724
  
  DataFlair Team
  Spectator
  
  In a MapReduce job first mapper executes then Combiner followed by Partitioner. So the execution is in below sequence.>
  1. Mapper: The input splits is processed by mapper and generates the intermediate output. Once the map task is completed on a mapper node, the node starts transferring the sorted map output over the network to the reducer node where the reduce task will be running. At the same time, the mapper node might be running other map tasks as well
  
  2. Combiner : it acts like a mini reducer. Combiners run after mapper to reduce the key value pair counts of mapper output. It used for the purpose of optimization and hence decreases the network overload during shuffling process. Combiner performs the same aggregation operation as a reducer.
  
  3. Partitioner: takes the decision that which key goes to which reducer by using Hash function. All the records having the same key will be sent to the same reducer for the final output computation.
- September 20, 2018 at 4:03 pm #5725
  
  DataFlair Team
  Spectator
  
  The sequence of execution is
  Mapper -> Combiner -> Partitioner
  
  Maps are the individual tasks which transform input records into a intermediate records.Here User writes it own custom logic for data processing.Before it writes to disk, the thread first divides the data into partitions corresponding to the reducers that they will ultimately be sent to.
  
  A partitioner partitions the key-value pairs of intermediate Map-outputs. It partitions the data using a user-defined condition, which works like a hash function. Within each partition, the background thread performs an in-memory sort by key(Sorting).
  
  The Combiner class is used in between the Map class and the Reduce class to reduce the volume of data transfer between Map and Reduce.Usually, the output of the map task is large and the data transferred to the reduce task is high.Combiner is optional yet it helps segregating data into multiple groups for Reduce phase, which makes it easier to process. It is run on the output of the sort.
- September 20, 2018 at 4:03 pm #5727
  
  DataFlair Team
  Spectator
  
  The Sequence of Execution goes as follows:
  Mapper -> Combiner -> Partitioner
  
  Mapper:
  The work of Mapper is to generate Intermediate outputs based on Key and Value pairs that it receives as input, Output is generated based on Custom Business logic. Once the Map task is finished Intermediate output is sent to Partitioner to further process the data.
  
  Partitioner:
  Partitioner’s are basically used to lower down the bandwidth consumption by sorting the key and value pairs received from Mapper depending on custom logic before the Intermediate output is fed to Reducer for aggregation/Summation.
  
  Combiner class is optional but it helps in segregating the data into multiple groups for Reduce Tasks.
Author

Posts

Viewing 4 reply threads

You must be logged in to reply to this topic.

What is the sequence of execution of Mapper, Combiner and Partitioner ?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses