What is the key- value pair in Hadoop MapReduce?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop What is the key- value pair in Hadoop MapReduce?

Viewing 4 reply threads
  • Author
    Posts
    • #5082
      DataFlair TeamDataFlair Team
      Spectator

      How is key-value pair generated in Hadoop MapReduce?
      What is the need for key value in Hadoop?

    • #5083
      DataFlair TeamDataFlair Team
      Spectator

      In Hadoop , the MapReduce is the place where actual processing of the input data takes place , all the incoming and outgoing data is in key-value pairsformat.

      There are two phases of operation in MapReduce.
      1. Map function
      2. Reduce function

      The RecordReader works on the input split to convert into (Key,Value) pair to be sent to Map function.The input data format passed to “Map” function is in Key, Value format.Similarly the output of a “Map” function is also in (Key,Value) pair.The output of the “Mapper” is not stored in HDFS as it is an intermediate output and it stored only in local file system.

      The “Reduce” function also accepts the input in the format of (Key,Value) pair , this too is a customized logic to output in (Key,Value) pairs

      The (Key,Value) pair format of the data allows the programmer to handle large volumne of computation with ease especially in a distributed file system containing structured , unstructured and semi structured data.

      Follow the link for morwe detail: key-value pairs in Hadoop

    • #5086
      DataFlair TeamDataFlair Team
      Spectator

      Hadoop deals with a large volume of structured, unstructured and semi-structured data in the distributed system; there may be data in the schema that is non-static, to batch-process these data with customized business logic, MapReduce software framework is used.

      Both the map and reduce phases use Key Value (K, V) pairs as input and output for processing the data.

      Key value pairs are generated in Hadoop MapReduce (MR) using

      1. InputSplit – Logical representation of data; data to be processed by an individual Mapper.

      2. RecordReader – Communicates and converts the InputSplit into records which are in form of key-value pairs that are suitable for reading by the mapper. (Uses TextInputFormat)

      (K,V) flow in MR:

      1. Map function processes key-value pair from the RecordReader and emits a certain number of key-value pairs (one record per mapper)

      2. The mapper output (intermediate output stored in local disk) is sent as input to the Reduce function.

      2. Reduce function processes values grouped by the same key and emits another set of key-value pairs as output (Final output stored in HDFS)

    • #5088
      DataFlair TeamDataFlair Team
      Spectator

      MapReduce actually works by dividing the processing tasks into 2 different phase: Map and Reduce Phase and each phase has key – value as input and output.

      Now let see how key-value generated, here we can say that InputSplit and RecordReader generate the key-value pair in Hadoop by using TextInputFormat which default InputSplit. And in this, the key is the byte offset of the line and value is the content of the line.

      And it could be changed by using the different or custom InputSplit types.
      The (Key, Value) pair format of the data allows the programmer to handle a large volume of computation with ease especially in a distributed file system containing structured, unstructured and semi-structured data.

    • #5089
      DataFlair TeamDataFlair Team
      Spectator

      Mapreduce is the data processing component of hadoop which process large amount of structured,semi structured or unstructured data.This data processing happens in two stages:

      1) Map phase
      2)Reduce phase

      Map phase:

      Data is firstly divided into input splits which are logical partition of data processed by individual mapper.Each Mapper will process the one split at a time and then record reader will communicate with input split and divides the data into key value pairs.

      Reduce phase:

      Mapper produces the intermediate key values pairs and these are read by reducer.Data corresponding to same key is grouped together and after some processing final output will be produced in the form of key value pairs.

      But why there is concept of key value pairs in mapreduce?

      This is because mapreduce is mainly used for data analysis and it is very easy to analyse the data when we convert it into key value pairs.In Hadoop, when the schema is static we can directly work on the column instead of keys and values, but, when the schema is not static, then we will work on keys and values. Keys and values are not the intrinsic properties of the data, but they are chosen by user analyzing the data. .

Viewing 4 reply threads
  • You must be logged in to reply to this topic.