Explain fullOuterJoin() operation in Apache Spark.

Viewing 1 reply thread
  • Author
    Posts
    • #5134
      DataFlair Team
      Moderator

      Explain fullOuterJoin() operation in Apache Spark.

    • #5139
      DataFlair Team
      Moderator

      > It is transformation.
      > It’s in package org.apache.spark.rdd.PairRDDFunctions

      def fullOuterJoin[W](other: RDD[(K, W)]): RDD[(K, (Option[V], Option[W]))]

      Perform a full outer join of this and other.
      For each element (k, v) in this, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for w in other,
      or the pair (k, (Some(v), None)) if no elements in other have key k.
      Similarly, for each element (k, w) in other, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for v in this,
      or the pair (k, (None, Some(w))) if no elements in this have key k.
      Hash-partitions the resulting RDD using the existing partitioner/ parallelism level.

      Example :

      val frdd1 = sc.parallelize(Seq(("Spark",35),("Hive",23),("Spark",45),("HBase",89)))
      val frdd2 = sc.parallelize(Seq(("Spark",74),("Flume",12),("Hive",14),("Kafka",25)))
      val fullouterjoinrdd = frdd1.fullOuterJoin(frdd2)
      fullouterjoinrdd.collect

      Output :
      Array[(String, (Option[Int], Option[Int]))] = Array((Spark,(Some(35),Some(74))), (Spark,(Some(45),Some(74))), (Kafka,(None,Some(25))), (Flume,(None,Some(12))), (Hive,(Some(23),Some(14))), (HBase,(Some(89),None)))

      For more transformation refer to Transformation and action in Apache Spark.

Viewing 1 reply thread
  • You must be logged in to reply to this topic.