What is Partitioner in Hadoop MapReduce ?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop What is Partitioner in Hadoop MapReduce ?

Viewing 2 reply threads
  • Author
    Posts
    • #5926
      DataFlair TeamDataFlair Team
      Spectator

      What is Partitioner in Hadoop

    • #5927
      DataFlair TeamDataFlair Team
      Spectator

      In a MapReduce Job, the flow of processing data is as follows:
      Data from Input file is divided into inputSplits, which is read and converted by the RecordReader into key-value pairs one by one and then passed to the User defined Mappers task.
      Now the output generated from Mapper task is called Intermediate output which is also in the form of (Key,Value) pair which is then passed to the Reducer task.

      But, the output generated by Mapper task is not sorted and includes different keysets in it.
      So, we can’t pass this intermediate output to Reducer task as it is, as Reducers role in MapReduce Job is of aggregation, so it should have the entire data pertaining to that particular respective keysets(so, that it can be easily aggregated).

      Hence, Partitioner comes in picture, where in Partitioner refines this intermediate output data by sorting-shuffling, and then distributing the output of the mapper among the reducers.

      It redirects the mapper output to the reducer by determining which reducer is responsible or a particular key.
      Hash function, is used to derive partition. On the basis of key-value each map output is partitioned.
      Record having same key value goes into the same partition (within each mapper), and then each partition is sent to a reducer.

      Number of Partitioner:
      Number of partitioner in Hadoop is equal to the number of Reducer.
      The data from single partitioner is processed by a single reducer.

      Follow the link for more detail: Partitioner in Hadoop

    • #5928
      DataFlair TeamDataFlair Team
      Spectator

      Partitioner phase comes after mapper phase and before reducer phase.Partitioner takes intermediate key-value pair produced after map phase as input and data gets partitioned across reducers by partition function.The default partition function is used partition the data according to hash code of the key.All the key-value pair with the same partitioner value will go to same reducer.Therefore, number of partitioner is equal to number of reducers.Partition function have to be defined carefully so that task could be distributed uniformly accross the reducers.

      Follow the link for more detail: Partitioner in Hadoop

Viewing 2 reply threads
  • You must be logged in to reply to this topic.