What is Reducer in Hadoop MapReduce?

Viewing 2 reply threads
  • Author
    Posts
    • #6009
      DataFlair TeamDataFlair Team
      Spectator

      What is Reducer / Reduce / Reduce Task?
      What type of processing is done in the Reducer in Hadoop?
      What can we do in Reducer of MapReduce?

    • #6012
      DataFlair TeamDataFlair Team
      Spectator

      In MapReduce algorithm, reducers are the second phase of processing, reducers is used for final summation and aggregation.

      To understand reducer properly let’s take a simple example. Suppose we want to find out summation of salaries of employee data (data is provided in a CSV file) by their job titles (e.g. summation of salaries of developers, manager’s etc).

      Typically in a MapReduce program, we will first create Key-Value pairs with available data. For our example, we will take Job titles as Key and Salaries as value. Many mappers will run to map this data based on custom business logic. But before this data is given to reducer, Shuffling-Sorting is done on this data. This is done internally. Then data is given to reducer. Usually, one reducer would be enough because data will come to reducer as (key, list of values i.e Job title, list of salaries e.g. Manager, (12000,13000,17000,13000….). So in the reducer, we have to just do an addition of all the values to get the summation of salaries by job titles.

      We can configure the number of reducer’s required in driver class. In few cases like data parsing, we might not require reducer, so we can configure the number of reducer as zero. In such cases sorting and shuffling is also not done.

      For more details, please visit: Reducer in Hadoop

    • #6015
      DataFlair TeamDataFlair Team
      Spectator

      Reducer is a phase in hadoop which comes after Mapper phase. The output of the mapper is given as the input for Reducer which processes and produces a new set of output, which will be stored in the HDFS.

      .Reducer first processes the intermediate values for particular key generated by the map function and then generates the output (zero or more key-value pair).

      The user decides the number of reducers. By default number of reducers is 1

      Phases of Reducer:

      There are 3 phases of Reducer in Hadoop MapReduce.

      1) Shuffling
      2) Sorting
      2) Reduce

      The shuffle and sort phases occur concurrently.In Shuffling phase, the sorted output from the mapper is the input to the Reducer
      after shuffling and, sorting, reduce task aggregates the key value pairs.

      For example in word count job, we have to count the number of occurrences of each word.
      For eg: We have input file with following data.

      Data: Hi how are you
      How is your job

      Here each line is read as record by Record Reader and given as K,V pair to Mapper as (0,Hi how are you) ,(15,How is your job)
      0 is zeroth position. 0 is key and whole line is Value
      Mapper will get this key value pair of each line and process it to create another K,V pair as output of mapper.
      Here mapper’s output will be as for eg: (Hi,1), (how,1),(how,1),(are,1) and so on…
      This mapper’s output will be given to Reducer as input. Here Reducer processes and created the output as (Hi,[1]),
      (how,[1,1,1,1]) and so on
      Then reducer output will be moved to RecordWriter which writes the output to output file.

      So the flow is RecordReader —> Mapper —> Reducer —-> RecordWriter —> Output file.

      For more details, please visit: Reducer in Hadoop

Viewing 2 reply threads
  • You must be logged in to reply to this topic.