Why aggregation cannot be done in Hadoop Mapper?

This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 3:48 pm #5628
  
  DataFlair Team
  Spectator
  
  Why do we need the Reducer to perform aggregation in MapReduce?
  Why can we not perform aggregation in Mapper?
- September 20, 2018 at 3:49 pm #5630
  
  DataFlair Team
  Spectator
  
  Aggregation cannot be performed in Mapper side. Below are the reasons for the same:
  1. Aggregation requires sorting of data, which happens only at Reducer side.
  2. For aggregation, we require output from all the mappers, which cannot be possible during map phase, because map tasks will be running in different nodes, where data blocks are present.
  3. Mapper is instantiated per InputSplit. Hence, once the InputSplit is processed, the data is lost from mapper and it is written as intermediate output to the local disk.
  Hence, there will not be previous data present in the mapper for aggregation.
  4. If we try to aggregate in mapper, this requires movement of data from all the mapper outputs running in different machines, which increases network congestion.
- September 20, 2018 at 3:49 pm #5631
  
  DataFlair Team
  Spectator
  
  Aggregation is performed to acquire the final result of the MapReduce job, that is combining the output of the Mapper and displaying the result. To perform the aggregation, the intermediate output from the mapper must undergo shuffling and sorting. Shuffling and sorting is performed to ensure that the values of the same key goes to the same Reducer.
  
  We cannot do aggregation (addition) in a mapper because, sorting is not done in a mapper. Sorting happens only on the reducer side. Mapper method initialization depends upon each input split. While doing aggregation, we will lose the value of the previous instance. For each row, a new mapper will get initialized. For each row, input split again gets divided into mapper, thus we do not have a track of the previous row value.
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

Why aggregation cannot be done in Hadoop Mapper?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses