Why Hadoop MapReduce uses key-value pair to process the data?

This topic has 3 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 3 reply threads

Author

Posts
- September 20, 2018 at 5:31 pm #6260
  
  DataFlair Team
  Spectator
  
  Why is key value pair needed in Hadoop MapReduce?
- September 20, 2018 at 5:31 pm #6261
  
  DataFlair Team
  Spectator
  
  MapReduce is based on original Google MapReduce Paper in terms of key-value pairs. The main reason behind using Key-Value pair for processing is because many real world tasks are expressible in this model, as shown in the Google MapReduce paper.
  
  MapReduce is a programming model for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key.
  
  Another reason is that this MapReduce programming model uses Key, Value which is easy to use, even for programmers without experience with parallel and distributed systems, since it hides the details of parallelization, Fault-Tolerance, locality optimization, and load balancing.
  
  Third reason, implementation of MapReduce that scales to large clusters of machines comprising thousands of machines. The implementation makes efficient use of these machine resources and therefore is suitable for use on many of the
  large computational problems.
  
  For more detail follow: key-value pairs in Hadoop
- September 20, 2018 at 5:32 pm #6262
  
  DataFlair Team
  Spectator
  
  Hadoop is based on Googles MapReduce concept of Key and Value pairs called tuples.
  
  The input to mapper is ( key1,value1) and output is called intermediate output which is also
  (Keyi, valuei). This is the input to the Reducer task and final key output value is the output of Reducer
  (Key2,value2)
  (Key1, value1)— Mapper—(keyi,valuei)— Reducer(key2,value2).
  
  The computations involves certain Map operations on each records of the input data to get an intermediate key/values pairs on which Reducer operation is done on values assoiciated with same key.By this Parallelism and locality optimization can be optained.
  
  Follow the link for morwe detail: key-value pairs in Hadoop
- September 20, 2018 at 5:32 pm #6264
  
  DataFlair Team
  Spectator
  
  Unlike Relational databases, Hadoop is not just restricted to deal with structured data but also semi/unstructured data.
  For data analysis, as the data is structured it was easy to map any data using the columns. But Hadoop dealing with variety of data and it being a distributed and parallel processing system, it is important to logically correlate the input that is given by the user.
  Generally, in map reduce job, map usually breaks the data in a logical way and reduce job aggregates the data .
  The output from map has to be given as input to reducer , for it to be able to aggregate all the data that logically belongs together. Using key value pairs makes it easier to do this logical mapping.
  
  Follow the link for morwe detail: key-value pairs in Hadoop
Author

Posts

Viewing 3 reply threads

You must be logged in to reply to this topic.

Why Hadoop MapReduce uses key-value pair to process the data?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses