Why MapReduce uses key-value pair to process the data?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 2:36 pm #5218
  
  DataFlair Team
  Spectator
  
  Why key value pair is used in MapReduce to process data?
  Why key value pair is needed in Hadoop MapReduce?
- September 20, 2018 at 2:36 pm #5221
  
  DataFlair Team
  Spectator
  
  MapReduce is original derived from Google MapReduce white paper.
  
  If we search anything in Google, the search engine always returns the result on the basis of highest page rank. To achieve this they developed a software framework MapReduce which perfectly fit into their large master and slave server architecture. In which computations involved applying a map operation to each input “record” in order to compute and generate a set of intermediate key-value pairs and then applying a reduce operation to all the values that shared the same key, in order to combine the derived data appropriately.
  
  The Same MapReduce is implemented in Hadoop and it deals with structured, unstructured and semi-structured data. The schema is not static, unlike RDBMS. If We were to have static schema we can directly work on columns instead of keys and values.
  
  In data analysis, we always look at statistical and/or logical techniques to describe and elaborate data, try to get summarized output by applying computations like aggregation, summation etc. Which fit’s into the MapReduce paradigm of key/value pairs quite well.
  
  Follow the link to leartn more about key-value pairs in Hadoop
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.