Job-ready Courses with Certificates – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › What is the key- value pair in MapReduce?
- This topic has 2 replies, 1 voice, and was last updated 6 years, 4 months ago by
DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 3:21 pm #5487
DataFlair Team
SpectatorHow key-value pair is generated in Hadoop MapReduce?
What is the need for key value in Hadoop?
Why Key Value pair is used in MapReduce to process the data? -
September 20, 2018 at 3:21 pm #5488
DataFlair Team
SpectatorRecordReader is the class which actually loads the data from the source. It is the class which converts the data into <Key, Value> pairs.
The Mapper will receive one <Key, Value> pair at a time until out spilt is consumed.
Hadoop deals with structured, unstructured and semi-structured data. In Hadoop, if the schema is static we can directly work on the column instead of keys and values, but, if the schema is not static we will work on keys and values. Keys and values are not the intrinsic properties of the data, but they are chosen by user analyzing the data. So to do analysis we have to specify what we are looking for (Key) and what’s value(value).
Key is the field on which data has to be grouped and aggregated on the reducer side.
Value is the field which has to be handled by individual MapReduce jobs.
Follow the link to learn more about <Key, Value> pairs in Hadoop -
September 20, 2018 at 3:22 pm #5489
DataFlair Team
SpectatorHadoop deals with structured, unstructured and semi-structured data. In Hadoop, if the schema is static we can directly work on the column instead of keys and values.
But, if the schema is not static we will work on <Key, Value> pairs. Keys and values are not the intrinsic properties of the data, but they are chosen by user analyzing the data. So, to do any analysis we have to specify what we are looking for( Key ) and what’s it’s worth( value ).Key– It will be the field/ text/ object on which the data has to be grouped and aggregated on the reducer side.
Value–It will be the field/ text/ object which is to be handled by each individual reduce method.
MapReduce works on key-value pairs. In MapReduce process, before passing the data to the mapper, data should be first converted into key-value pairs as mapper only understands key-value pairs of data.InputSplit
InputSplit is the logical representation of data. The data to be processed by an individual Mapper is presented by the InputSplit.
RecordReader
RecordReader communicates with the InputSplit and it converts the Split into records which are in form of key-value pairs that are suitable for reading by the mapper.
By default, RecordReader uses TextInputFormat for converting data into a key-value pair. Here, line offset will be the key and the content of the line will be the value as Text.
RecordReader communicates with the InputSplit until the file reading is not completed.Follow the link to learn more about <Key, Value> pairs in Hadoop
-
-
AuthorPosts
- You must be logged in to reply to this topic.