What is the key- value pair in Hadoop MapReduce?

This topic has 4 replies, 1 voice, and was last updated 7 years, 9 months ago by DataFlair Team.

Viewing 4 reply threads

Author

Posts
- September 20, 2018 at 2:12 pm #5082
  
  DataFlair Team
  Spectator
  
  How is key-value pair generated in Hadoop MapReduce?
  What is the need for key value in Hadoop?
- September 20, 2018 at 2:12 pm #5083
  
  DataFlair Team
  Spectator
  
  In Hadoop , the MapReduce is the place where actual processing of the input data takes place , all the incoming and outgoing data is in key-value pairsformat.
  
  There are two phases of operation in MapReduce.
  1. Map function
  2. Reduce function
  
  The RecordReader works on the input split to convert into (Key,Value) pair to be sent to Map function.The input data format passed to “Map” function is in Key, Value format.Similarly the output of a “Map” function is also in (Key,Value) pair.The output of the “Mapper” is not stored in HDFS as it is an intermediate output and it stored only in local file system.
  
  The “Reduce” function also accepts the input in the format of (Key,Value) pair , this too is a customized logic to output in (Key,Value) pairs
  
  The (Key,Value) pair format of the data allows the programmer to handle large volumne of computation with ease especially in a distributed file system containing structured , unstructured and semi structured data.
  
  Follow the link for morwe detail: key-value pairs in Hadoop
- September 20, 2018 at 2:12 pm #5086
  
  DataFlair Team
  Spectator
  
  Hadoop deals with a large volume of structured, unstructured and semi-structured data in the distributed system; there may be data in the schema that is non-static, to batch-process these data with customized business logic, MapReduce software framework is used.
  
  Both the map and reduce phases use Key Value (K, V) pairs as input and output for processing the data.
  
  Key value pairs are generated in Hadoop MapReduce (MR) using
  
  1. InputSplit – Logical representation of data; data to be processed by an individual Mapper.
  
  2. RecordReader – Communicates and converts the InputSplit into records which are in form of key-value pairs that are suitable for reading by the mapper. (Uses TextInputFormat)
  
  (K,V) flow in MR:
  
  1. Map function processes key-value pair from the RecordReader and emits a certain number of key-value pairs (one record per mapper)
  
  2. The mapper output (intermediate output stored in local disk) is sent as input to the Reduce function.
  
  2. Reduce function processes values grouped by the same key and emits another set of key-value pairs as output (Final output stored in HDFS)
- September 20, 2018 at 2:12 pm #5088
  
  DataFlair Team
  Spectator
  
  MapReduce actually works by dividing the processing tasks into 2 different phase: Map and Reduce Phase and each phase has key – value as input and output.
  
  Now let see how key-value generated, here we can say that InputSplit and RecordReader generate the key-value pair in Hadoop by using TextInputFormat which default InputSplit. And in this, the key is the byte offset of the line and value is the content of the line.
  
  And it could be changed by using the different or custom InputSplit types.
  The (Key, Value) pair format of the data allows the programmer to handle a large volume of computation with ease especially in a distributed file system containing structured, unstructured and semi-structured data.
- September 20, 2018 at 2:13 pm #5089
  
  DataFlair Team
  Spectator
  
  Mapreduce is the data processing component of hadoop which process large amount of structured,semi structured or unstructured data.This data processing happens in two stages:
  
  1) Map phase
  2)Reduce phase
  
  Map phase:
  
  Data is firstly divided into input splits which are logical partition of data processed by individual mapper.Each Mapper will process the one split at a time and then record reader will communicate with input split and divides the data into key value pairs.
  
  Reduce phase:
  
  Mapper produces the intermediate key values pairs and these are read by reducer.Data corresponding to same key is grouped together and after some processing final output will be produced in the form of key value pairs.
  
  But why there is concept of key value pairs in mapreduce?
  
  This is because mapreduce is mainly used for data analysis and it is very easy to analyse the data when we convert it into key value pairs.In Hadoop, when the schema is static we can directly work on the column instead of keys and values, but, when the schema is not static, then we will work on keys and values. Keys and values are not the intrinsic properties of the data, but they are chosen by user analyzing the data. .
Author

Posts

Viewing 4 reply threads

You must be logged in to reply to this topic.

What is the key- value pair in Hadoop MapReduce?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses