Explain join() operation

This topic has 1 reply, 1 voice, and was last updated 7 years, 10 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 2:15 pm #5101
  
  DataFlair Team
  Spectator
  
  Explain join() operation
- September 20, 2018 at 2:15 pm #5104
  DataFlair Team
  Spectator
  > join() is transformation.
  > It’s in package org.apache.spark.rdd.pairRDDFunction
  
  def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))]Permalink
  
  Return an RDD containing all pairs of elements with matching keys in this and other.
  Each pair of elements will be returned as a (k, (v1, v2)) tuple, where (k, v1) is in this and (k, v2) is in other. Performs a hash join across the cluster.
  
  From :
  http://data-flair.training/blogs/rdd-transformations-actions-apis-apache-spark/#213_Join
  
  It is joining two datasets. When called on datasets of type (K, V) and (K, W), returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key. Outer joins are supported through leftOuterJoin, rightOuterJoin, and fullOuterJoin.
  
  Example1:
```
val rdd1 = sc.parallelize(Seq(("m",55),("m",56),("e",57),("e",58),("s",59),("s",54)))
val rdd2 = sc.parallelize(Seq(("m",60),("m",65),("s",61),("s",62),("h",63),("h",64)))
val joinrdd = rdd1.join(rdd2)
joinrdd.collect
```
  Output:
  
  Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (s,(54,61)), (s,(54,62)))
  
  Example2:
```
val myrdd1 = sc.parallelize(Seq((1,2),(3,4),(3,6)))
val myrdd2 = sc.parallelize(Seq((3,9)))
val myjoinedrdd = myrdd1.join(myrdd2)
myjoinedrdd.collect
```
  Output:
  Array[(Int, (Int, Int))] = Array((3,(4,9)), (3,(6,9)))
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

Explain join() operation

About DataFlair

Trending Courses in Indore

Trending Courses in Bangalore

Trending Courses in Chennai

Trending Courses in Pune

Trending Courses in Hyderabad

Trending Courses in Delhi NCR