What is the sequence of execution of Mapper, Combiner and Partitioner ?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop What is the sequence of execution of Mapper, Combiner and Partitioner ?

Viewing 4 reply threads
  • Author
    Posts
    • #5722
      DataFlair TeamDataFlair Team
      Spectator

      What is the sequence of execution of Mapper, Combiner and Partitioner ?

    • #5723
      DataFlair TeamDataFlair Team
      Spectator

      The sequence of execution of the mentioned components happens in the below order:
      Mapper -> Combiner -> Partitioner
      Mapper : The Input data is initially processed by all the Mappers/Map jobs and the intermediate output is created.

      Combiner : All the intermediate outputs are optimized by local aggregation before the shuffle/sort phase by the Combiner. The primary goal of Combiners is to save as much bandwidth as possible by minimizing the number of key/value pairs that will be shuffled across the network and provided as input to the Reducer.

      Partitioner : In Hadoop, partitioning of the keys of the intermediate map output is controlled by Partitioner. Hash function, is used to derive partition. On the basis of key-value pair each map output is partitioned. Record having same key value goes into the same partition (within each mapper), and then each partition is sent to a Reducer. Partition phase takes place in between mapper and reducer.
      Default Partitioner (Hash Partitioner) computes a hash value for the key and assigns the partition based on this result

    • #5724
      DataFlair TeamDataFlair Team
      Spectator

      In a MapReduce job first mapper executes then Combiner followed by Partitioner. So the execution is in below sequence.>
      1. Mapper: The input splits is processed by mapper and generates the intermediate output. Once the map task is completed on a mapper node, the node starts transferring the sorted map output over the network to the reducer node where the reduce task will be running. At the same time, the mapper node might be running other map tasks as well

      2. Combiner : it acts like a mini reducer. Combiners run after mapper to reduce the key value pair counts of mapper output. It used for the purpose of optimization and hence decreases the network overload during shuffling process. Combiner performs the same aggregation operation as a reducer.

      3. Partitioner: takes the decision that which key goes to which reducer by using Hash function. All the records having the same key will be sent to the same reducer for the final output computation.

    • #5725
      DataFlair TeamDataFlair Team
      Spectator

      The sequence of execution is
      Mapper -> Combiner -> Partitioner

      Maps are the individual tasks which transform input records into a intermediate records.Here User writes it own custom logic for data processing.Before it writes to disk, the thread first divides the data into partitions corresponding to the reducers that they will ultimately be sent to.

      A partitioner partitions the key-value pairs of intermediate Map-outputs. It partitions the data using a user-defined condition, which works like a hash function. Within each partition, the background thread performs an in-memory sort by key(Sorting).

      The Combiner class is used in between the Map class and the Reduce class to reduce the volume of data transfer between Map and Reduce.Usually, the output of the map task is large and the data transferred to the reduce task is high.Combiner is optional yet it helps segregating data into multiple groups for Reduce phase, which makes it easier to process. It is run on the output of the sort.

    • #5727
      DataFlair TeamDataFlair Team
      Spectator

      The Sequence of Execution goes as follows:
      Mapper -> Combiner -> Partitioner

      Mapper:
      The work of Mapper is to generate Intermediate outputs based on Key and Value pairs that it receives as input, Output is generated based on Custom Business logic. Once the Map task is finished Intermediate output is sent to Partitioner to further process the data.

      Partitioner:
      Partitioner’s are basically used to lower down the bandwidth consumption by sorting the key and value pairs received from Mapper depending on custom logic before the Intermediate output is fed to Reducer for aggregation/Summation.

      Combiner class is optional but it helps in segregating the data into multiple groups for Reduce Tasks.

Viewing 4 reply threads
  • You must be logged in to reply to this topic.