Explain fullOuterJoin() operation in Apache Spark.

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Spark Explain fullOuterJoin() operation in Apache Spark.

Viewing 1 reply thread
  • Author
    Posts
    • #5134
      DataFlair TeamDataFlair Team
      Spectator

      Explain fullOuterJoin() operation in Apache Spark.

    • #5139
      DataFlair TeamDataFlair Team
      Spectator

      > It is transformation.
      > It’s in package org.apache.spark.rdd.PairRDDFunctions

      def fullOuterJoin[W](other: RDD[(K, W)]): RDD[(K, (Option[V], Option[W]))]

      Perform a full outer join of this and other.
      For each element (k, v) in this, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for w in other,
      or the pair (k, (Some(v), None)) if no elements in other have key k.
      Similarly, for each element (k, w) in other, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for v in this,
      or the pair (k, (None, Some(w))) if no elements in this have key k.
      Hash-partitions the resulting RDD using the existing partitioner/ parallelism level.

      Example :

      val frdd1 = sc.parallelize(Seq(("Spark",35),("Hive",23),("Spark",45),("HBase",89)))
      val frdd2 = sc.parallelize(Seq(("Spark",74),("Flume",12),("Hive",14),("Kafka",25)))
      val fullouterjoinrdd = frdd1.fullOuterJoin(frdd2)
      fullouterjoinrdd.collect

      Output :
      Array[(String, (Option[Int], Option[Int]))] = Array((Spark,(Some(35),Some(74))), (Spark,(Some(45),Some(74))), (Kafka,(None,Some(25))), (Flume,(None,Some(12))), (Hive,(Some(23),Some(14))), (HBase,(Some(89),None)))

      For more transformation refer to Transformation and action in Apache Spark.

Viewing 1 reply thread
  • You must be logged in to reply to this topic.