What is the key- value pair in MapReduce?

This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 3:21 pm #5487
  
  DataFlair Team
  Spectator
  
  How key-value pair is generated in Hadoop MapReduce?
  What is the need for key value in Hadoop?
  Why Key Value pair is used in MapReduce to process the data?
- September 20, 2018 at 3:21 pm #5488
  
  DataFlair Team
  Spectator
  
  RecordReader is the class which actually loads the data from the source. It is the class which converts the data into <Key, Value> pairs.
  
  The Mapper will receive one <Key, Value> pair at a time until out spilt is consumed.
  
  Hadoop deals with structured, unstructured and semi-structured data. In Hadoop, if the schema is static we can directly work on the column instead of keys and values, but, if the schema is not static we will work on keys and values. Keys and values are not the intrinsic properties of the data, but they are chosen by user analyzing the data. So to do analysis we have to specify what we are looking for (Key) and what’s value(value).
  
  Key is the field on which data has to be grouped and aggregated on the reducer side.
  Value is the field which has to be handled by individual MapReduce jobs.
  Follow the link to learn more about <Key, Value> pairs in Hadoop
- September 20, 2018 at 3:22 pm #5489
  
  DataFlair Team
  Spectator
  
  Hadoop deals with structured, unstructured and semi-structured data. In Hadoop, if the schema is static we can directly work on the column instead of keys and values.
  But, if the schema is not static we will work on <Key, Value> pairs. Keys and values are not the intrinsic properties of the data, but they are chosen by user analyzing the data. So, to do any analysis we have to specify what we are looking for( Key ) and what’s it’s worth( value ).
  
  Key– It will be the field/ text/ object on which the data has to be grouped and aggregated on the reducer side.
  Value–It will be the field/ text/ object which is to be handled by each individual reduce method.
  MapReduce works on key-value pairs. In MapReduce process, before passing the data to the mapper, data should be first converted into key-value pairs as mapper only understands key-value pairs of data.
  
  InputSplit
  
  InputSplit is the logical representation of data. The data to be processed by an individual Mapper is presented by the InputSplit.
  
  RecordReader
  
  RecordReader communicates with the InputSplit and it converts the Split into records which are in form of key-value pairs that are suitable for reading by the mapper.
  By default, RecordReader uses TextInputFormat for converting data into a key-value pair. Here, line offset will be the key and the content of the line will be the value as Text.
  RecordReader communicates with the InputSplit until the file reading is not completed.
  
  Follow the link to learn more about <Key, Value> pairs in Hadoop
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

What is the key- value pair in MapReduce?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses