Expain leftOuterJoin() and rightOuterJoin() operation.

Viewing 1 reply thread
  • Author
    Posts
    • #5124
      DataFlair Team
      Moderator

      Expain leftOuterJoin() and rightOuterJoin() operation.

    • #5126
      DataFlair Team
      Moderator

      > Both leftOuterJoin() and rightOuterJoin() are transformation.
      > Both in package org.apache.spark.rdd.PairRDDFunctions

      leftOuterJoin() :

      def leftOuterJoin[W](other: RDD[(K, W)]): RDD[(K, (V, Option[W]))]

      Perform a left outer join of this and other. For each element (k, v) in this, the resulting RDD will either contain all pairs (k, (v, Some(w))) for w in other, or the pair (k, (v, None)) if no elements in other have key k. Hash-partitions the output using the existing partitioner/parallelism level.

      leftOuterJoin() performs a join between two RDDs where the keys must be present in first RDD

      Example :

      val rdd1 = sc.parallelize(Seq(("m",55),("m",56),("e",57),("e",58),("s",59),("s",54)))
      val rdd2 = sc.parallelize(Seq(("m",60),("m",65),("s",61),("s",62),("h",63),("h",64)))
      val leftjoinrdd = rdd1.leftOuterJoin(rdd2)
      leftjoinrdd.collect

      Output :
      Array[(String, (Int, Option[Int]))] = Array((s,(59,Some(61))), (s,(59,Some(62))), (s,(54,Some(61))), (s,(54,Some(62))), (e,(57,None)), (e,(58,None)), (m,(55,Some(60))), (m,(55,Some(65))), (m,(56,Some(60))), (m,(56,Some(65))))

      rightOuterJoin():
      def rightOuterJoin[W](other: RDD[(K, W)]): RDD[(K, (Option[V], W))]

      Perform a right outer join of this and other. For each element (k, w) in other, the resulting RDD will either contain all pairs (k, (Some(v), w)) for v in this, or the pair (k, (None, w)) if no elements in this have key k. Hash-partitions the resulting RDD using the existing partitioner/parallelism level.

      It performs the join between two RDDs where the key must be present in other RDD

      Example:

      val rdd1 = sc.parallelize(Seq(("m",55),("m",56),("e",57),("e",58),("s",59),("s",54)))
      val rdd2 = sc.parallelize(Seq(("m",60),("m",65),("s",61),("s",62),("h",63),("h",64)))
      val rightjoinrdd = rdd1.rightOuterJoin(rdd2)
      rightjoinrdd.collect


      Array[(String, (Option[Int], Int))] = Array((s,(Some(59),61)), (s,(Some(59),62)), (s,(Some(54),61)), (s,(Some(54),62)), (h,(None,63)), (h,(None,64)), (m,(Some(55),60)), (m,(Some(55),65)), (m,(Some(56),60)), (m,(Some(56),65)))

      For more Transformations tour to Transformation and Action in Spark

Viewing 1 reply thread
  • You must be logged in to reply to this topic.