Job-ready Courses with Certificates – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › What is Reducer in Hadoop MapReduce?
- This topic has 2 replies, 1 voice, and was last updated 6 years, 7 months ago by
DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 4:57 pm #6009
DataFlair Team
SpectatorWhat is Reducer / Reduce / Reduce Task?
What type of processing is done in the Reducer in Hadoop?
What can we do in Reducer of MapReduce? -
September 20, 2018 at 4:57 pm #6012
DataFlair Team
SpectatorIn MapReduce algorithm, reducers are the second phase of processing, reducers is used for final summation and aggregation.
To understand reducer properly let’s take a simple example. Suppose we want to find out summation of salaries of employee data (data is provided in a CSV file) by their job titles (e.g. summation of salaries of developers, manager’s etc).
Typically in a MapReduce program, we will first create Key-Value pairs with available data. For our example, we will take Job titles as Key and Salaries as value. Many mappers will run to map this data based on custom business logic. But before this data is given to reducer, Shuffling-Sorting is done on this data. This is done internally. Then data is given to reducer. Usually, one reducer would be enough because data will come to reducer as (key, list of values i.e Job title, list of salaries e.g. Manager, (12000,13000,17000,13000….). So in the reducer, we have to just do an addition of all the values to get the summation of salaries by job titles.
We can configure the number of reducer’s required in driver class. In few cases like data parsing, we might not require reducer, so we can configure the number of reducer as zero. In such cases sorting and shuffling is also not done.
For more details, please visit: Reducer in Hadoop
-
September 20, 2018 at 4:57 pm #6015
DataFlair Team
SpectatorReducer is a phase in hadoop which comes after Mapper phase. The output of the mapper is given as the input for Reducer which processes and produces a new set of output, which will be stored in the HDFS.
.Reducer first processes the intermediate values for particular key generated by the map function and then generates the output (zero or more key-value pair).
The user decides the number of reducers. By default number of reducers is 1
Phases of Reducer:
There are 3 phases of Reducer in Hadoop MapReduce.
1) Shuffling
2) Sorting
2) ReduceThe shuffle and sort phases occur concurrently.In Shuffling phase, the sorted output from the mapper is the input to the Reducer
after shuffling and, sorting, reduce task aggregates the key value pairs.For example in word count job, we have to count the number of occurrences of each word.
For eg: We have input file with following data.Data: Hi how are you
How is your jobHere each line is read as record by Record Reader and given as K,V pair to Mapper as (0,Hi how are you) ,(15,How is your job)
0 is zeroth position. 0 is key and whole line is Value
Mapper will get this key value pair of each line and process it to create another K,V pair as output of mapper.
Here mapper’s output will be as for eg: (Hi,1), (how,1),(how,1),(are,1) and so on…
This mapper’s output will be given to Reducer as input. Here Reducer processes and created the output as (Hi,[1]),
(how,[1,1,1,1]) and so on
Then reducer output will be moved to RecordWriter which writes the output to output file.So the flow is RecordReader —> Mapper —> Reducer —-> RecordWriter —> Output file.
For more details, please visit: Reducer in Hadoop
-
-
AuthorPosts
- You must be logged in to reply to this topic.