Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Spark › Expain leftOuterJoin() and rightOuterJoin() operation.
- This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 2:19 pm #5124DataFlair TeamSpectator
Expain leftOuterJoin() and rightOuterJoin() operation.
-
September 20, 2018 at 2:19 pm #5126DataFlair TeamSpectator
> Both leftOuterJoin() and rightOuterJoin() are transformation.
> Both in package org.apache.spark.rdd.PairRDDFunctionsleftOuterJoin() :
def leftOuterJoin[W](other: RDD[(K, W)]): RDD[(K, (V, Option[W]))]
Perform a left outer join of this and other. For each element (k, v) in this, the resulting RDD will either contain all pairs (k, (v, Some(w))) for w in other, or the pair (k, (v, None)) if no elements in other have key k. Hash-partitions the output using the existing partitioner/parallelism level.
leftOuterJoin() performs a join between two RDDs where the keys must be present in first RDD
Example :
val rdd1 = sc.parallelize(Seq(("m",55),("m",56),("e",57),("e",58),("s",59),("s",54))) val rdd2 = sc.parallelize(Seq(("m",60),("m",65),("s",61),("s",62),("h",63),("h",64))) val leftjoinrdd = rdd1.leftOuterJoin(rdd2) leftjoinrdd.collect
Output :
Array[(String, (Int, Option[Int]))] = Array((s,(59,Some(61))), (s,(59,Some(62))), (s,(54,Some(61))), (s,(54,Some(62))), (e,(57,None)), (e,(58,None)), (m,(55,Some(60))), (m,(55,Some(65))), (m,(56,Some(60))), (m,(56,Some(65))))rightOuterJoin():
def rightOuterJoin[W](other: RDD[(K, W)]): RDD[(K, (Option[V], W))]Perform a right outer join of this and other. For each element (k, w) in other, the resulting RDD will either contain all pairs (k, (Some(v), w)) for v in this, or the pair (k, (None, w)) if no elements in this have key k. Hash-partitions the resulting RDD using the existing partitioner/parallelism level.
It performs the join between two RDDs where the key must be present in other RDD
Example:
val rdd1 = sc.parallelize(Seq(("m",55),("m",56),("e",57),("e",58),("s",59),("s",54))) val rdd2 = sc.parallelize(Seq(("m",60),("m",65),("s",61),("s",62),("h",63),("h",64))) val rightjoinrdd = rdd1.rightOuterJoin(rdd2) rightjoinrdd.collect
Array[(String, (Option[Int], Int))] = Array((s,(Some(59),61)), (s,(Some(59),62)), (s,(Some(54),61)), (s,(Some(54),62)), (h,(None,63)), (h,(None,64)), (m,(Some(55),60)), (m,(Some(55),65)), (m,(Some(56),60)), (m,(Some(56),65)))
For more Transformations tour to Transformation and Action in Spark
-
-
AuthorPosts
- You must be logged in to reply to this topic.