This topic contains 1 reply, has 1 voice, and was last updated by  dfbdteam5 10 months ago.

Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
    Posts
  • #5124

    dfbdteam5
    Moderator

    Expain leftOuterJoin() and rightOuterJoin() operation.

    #5126

    dfbdteam5
    Moderator

    > Both leftOuterJoin() and rightOuterJoin() are transformation.
    > Both in package org.apache.spark.rdd.PairRDDFunctions

    leftOuterJoin() :

    def leftOuterJoin[W](other: RDD[(K, W)]): RDD[(K, (V, Option[W]))]

    Perform a left outer join of this and other. For each element (k, v) in this, the resulting RDD will either contain all pairs (k, (v, Some(w))) for w in other, or the pair (k, (v, None)) if no elements in other have key k. Hash-partitions the output using the existing partitioner/parallelism level.

    leftOuterJoin() performs a join between two RDDs where the keys must be present in first RDD

    Example :

    val rdd1 = sc.parallelize(Seq(("m",55),("m",56),("e",57),("e",58),("s",59),("s",54)))
    val rdd2 = sc.parallelize(Seq(("m",60),("m",65),("s",61),("s",62),("h",63),("h",64)))
    val leftjoinrdd = rdd1.leftOuterJoin(rdd2)
    leftjoinrdd.collect

    Output :
    Array[(String, (Int, Option[Int]))] = Array((s,(59,Some(61))), (s,(59,Some(62))), (s,(54,Some(61))), (s,(54,Some(62))), (e,(57,None)), (e,(58,None)), (m,(55,Some(60))), (m,(55,Some(65))), (m,(56,Some(60))), (m,(56,Some(65))))

    rightOuterJoin():
    def rightOuterJoin[W](other: RDD[(K, W)]): RDD[(K, (Option[V], W))]

    Perform a right outer join of this and other. For each element (k, w) in other, the resulting RDD will either contain all pairs (k, (Some(v), w)) for v in this, or the pair (k, (None, w)) if no elements in this have key k. Hash-partitions the resulting RDD using the existing partitioner/parallelism level.

    It performs the join between two RDDs where the key must be present in other RDD

    Example:

    val rdd1 = sc.parallelize(Seq(("m",55),("m",56),("e",57),("e",58),("s",59),("s",54)))
    val rdd2 = sc.parallelize(Seq(("m",60),("m",65),("s",61),("s",62),("h",63),("h",64)))
    val rightjoinrdd = rdd1.rightOuterJoin(rdd2)
    rightjoinrdd.collect


    Array[(String, (Option[Int], Int))] = Array((s,(Some(59),61)), (s,(Some(59),62)), (s,(Some(54),61)), (s,(Some(54),62)), (h,(None,63)), (h,(None,64)), (m,(Some(55),60)), (m,(Some(55),65)), (m,(Some(56),60)), (m,(Some(56),65)))

    For more Transformations tour to Transformation and Action in Spark

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.