Why MapReduce uses key-value pair to process the data?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop Why MapReduce uses key-value pair to process the data?

Viewing 1 reply thread
  • Author
    Posts
    • #5218
      DataFlair TeamDataFlair Team
      Spectator

      Why key value pair is used in MapReduce to process data?
      Why key value pair is needed in Hadoop MapReduce?

    • #5221
      DataFlair TeamDataFlair Team
      Spectator

      MapReduce is original derived from Google MapReduce white paper.

      If we search anything in Google, the search engine always returns the result on the basis of highest page rank. To achieve this they developed a software framework MapReduce which perfectly fit into their large master and slave server architecture. In which computations involved applying a map operation to each input “record” in order to compute and generate a set of intermediate key-value pairs and then applying a reduce operation to all the values that shared the same key, in order to combine the derived data appropriately.

      The Same MapReduce is implemented in Hadoop and it deals with structured, unstructured and semi-structured data. The schema is not static, unlike RDBMS. If We were to have static schema we can directly work on columns instead of keys and values.

      In data analysis, we always look at statistical and/or logical techniques to describe and elaborate data, try to get summarized output by applying computations like aggregation, summation etc. Which fit’s into the MapReduce paradigm of key/value pairs quite well.

      Follow the link to leartn more about key-value pairs in Hadoop

Viewing 1 reply thread
  • You must be logged in to reply to this topic.