What is paired RDD in Apache Spark?

Viewing 1 reply thread
  • Author
    Posts
    • #5897
      DataFlair TeamDataFlair Team
      Spectator

      Explain the term paired RDD in Apache Spark.
      What do you understand by paired RDD in Spark?

    • #5898
      DataFlair TeamDataFlair Team
      Spectator

      Pair RDD is a special type of RDD in Apache Spark which extends its capabilities from a normal RDD and adds its own set of transformations. The elements in a Pair RDD are key-value pairs which are particularly very helpful where the user needs to perform similar operations on each key.

      For example: reduceByKey(), aggregateByKey(), foldByKey(), sortByKey() etc.

      val file = sc.textFile("/path/to/file")
      val words = file.flatMap(line => line.split(" ")) // words is a normal RDD of String
      val tuple = words.map(word => (word, 1)) // tuple is a Pair RDD of (string, 1)
      val wc = tuple.reduceByKey((a, b) => a + b) // performing sum of count for each word (i.e.) key

Viewing 1 reply thread
  • You must be logged in to reply to this topic.