This topic contains 2 replies, has 1 voice, and was last updated by  dfbdteam3 1 year, 6 months ago.

Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
    Posts
  • #5487

    dfbdteam3
    Moderator

    How key-value pair is generated in Hadoop MapReduce?
    What is the need for key value in Hadoop?
    Why Key Value pair is used in MapReduce to process the data?

    #5488

    dfbdteam3
    Moderator

    RecordReader is the class which actually loads the data from the source. It is the class which converts the data into <Key, Value> pairs.

    The Mapper will receive one <Key, Value> pair at a time until out spilt is consumed.

    Hadoop deals with structured, unstructured and semi-structured data. In Hadoop, if the schema is static we can directly work on the column instead of keys and values, but, if the schema is not static we will work on keys and values. Keys and values are not the intrinsic properties of the data, but they are chosen by user analyzing the data. So to do analysis we have to specify what we are looking for (Key) and what’s value(value).

    Key is the field on which data has to be grouped and aggregated on the reducer side.
    Value is the field which has to be handled by individual MapReduce jobs.
    Follow the link to learn more about <Key, Value> pairs in Hadoop

    #5489

    dfbdteam3
    Moderator

    Hadoop deals with structured, unstructured and semi-structured data. In Hadoop, if the schema is static we can directly work on the column instead of keys and values.
    But, if the schema is not static we will work on <Key, Value> pairs. Keys and values are not the intrinsic properties of the data, but they are chosen by user analyzing the data. So, to do any analysis we have to specify what we are looking for( Key ) and what’s it’s worth( value ).

    Key– It will be the field/ text/ object on which the data has to be grouped and aggregated on the reducer side.
    Value–It will be the field/ text/ object which is to be handled by each individual reduce method.
    MapReduce works on key-value pairs. In MapReduce process, before passing the data to the mapper, data should be first converted into key-value pairs as mapper only understands key-value pairs of data.

    InputSplit

    InputSplit is the logical representation of data. The data to be processed by an individual Mapper is presented by the InputSplit.

    RecordReader

    RecordReader communicates with the InputSplit and it converts the Split into records which are in form of key-value pairs that are suitable for reading by the mapper.
    By default, RecordReader uses TextInputFormat for converting data into a key-value pair. Here, line offset will be the key and the content of the line will be the value as Text.
    RecordReader communicates with the InputSplit until the file reading is not completed.

    Follow the link to learn more about <Key, Value> pairs in Hadoop

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic.