What is Partitioner in Hadoop MapReduce ?

This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 4:45 pm #5926
  
  DataFlair Team
  Spectator
  
  What is Partitioner in Hadoop
- September 20, 2018 at 4:45 pm #5927
  
  DataFlair Team
  Spectator
  
  In a MapReduce Job, the flow of processing data is as follows:
  Data from Input file is divided into inputSplits, which is read and converted by the RecordReader into key-value pairs one by one and then passed to the User defined Mappers task.
  Now the output generated from Mapper task is called Intermediate output which is also in the form of (Key,Value) pair which is then passed to the Reducer task.
  
  But, the output generated by Mapper task is not sorted and includes different keysets in it.
  So, we can’t pass this intermediate output to Reducer task as it is, as Reducers role in MapReduce Job is of aggregation, so it should have the entire data pertaining to that particular respective keysets(so, that it can be easily aggregated).
  
  Hence, Partitioner comes in picture, where in Partitioner refines this intermediate output data by sorting-shuffling, and then distributing the output of the mapper among the reducers.
  
  It redirects the mapper output to the reducer by determining which reducer is responsible or a particular key.
  Hash function, is used to derive partition. On the basis of key-value each map output is partitioned.
  Record having same key value goes into the same partition (within each mapper), and then each partition is sent to a reducer.
  
  Number of Partitioner:
  Number of partitioner in Hadoop is equal to the number of Reducer.
  The data from single partitioner is processed by a single reducer.
  
  Follow the link for more detail: Partitioner in Hadoop
- September 20, 2018 at 4:45 pm #5928
  
  DataFlair Team
  Spectator
  
  Partitioner phase comes after mapper phase and before reducer phase.Partitioner takes intermediate key-value pair produced after map phase as input and data gets partitioned across reducers by partition function.The default partition function is used partition the data according to hash code of the key.All the key-value pair with the same partitioner value will go to same reducer.Therefore, number of partitioner is equal to number of reducers.Partition function have to be defined carefully so that task could be distributed uniformly accross the reducers.
  
  Follow the link for more detail: Partitioner in Hadoop
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

What is Partitioner in Hadoop MapReduce ?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses