What is paired RDD in Apache Spark?

This topic has 1 reply, 1 voice, and was last updated 5 years, 8 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 4:35 pm #5897
  
  DataFlair Team
  Spectator
  
  Explain the term paired RDD in Apache Spark.
  What do you understand by paired RDD in Spark?
- September 20, 2018 at 4:35 pm #5898
  
  DataFlair Team
  Spectator
  
  Pair RDD is a special type of RDD in Apache Spark which extends its capabilities from a normal RDD and adds its own set of transformations. The elements in a Pair RDD are key-value pairs which are particularly very helpful where the user needs to perform similar operations on each key.
  
  For example: reduceByKey(), aggregateByKey(), foldByKey(), sortByKey() etc.
  
  val file = sc.textFile("/path/to/file")
  val words = file.flatMap(line => line.split(" ")) // words is a normal RDD of String
  val tuple = words.map(word => (word, 1)) // tuple is a Pair RDD of (string, 1)
  val wc = tuple.reduceByKey((a, b) => a + b) // performing sum of count for each word (i.e.) key
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.