Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Spark › Explain fullOuterJoin() operation in Apache Spark.
- This topic has 1 reply, 1 voice, and was last updated 5 years, 6 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 2:21 pm #5134DataFlair TeamSpectator
Explain fullOuterJoin() operation in Apache Spark.
-
September 20, 2018 at 2:22 pm #5139DataFlair TeamSpectator
> It is transformation.
> It’s in package org.apache.spark.rdd.PairRDDFunctionsdef fullOuterJoin[W](other: RDD[(K, W)]): RDD[(K, (Option[V], Option[W]))]
Perform a full outer join of this and other.
For each element (k, v) in this, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for w in other,
or the pair (k, (Some(v), None)) if no elements in other have key k.
Similarly, for each element (k, w) in other, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for v in this,
or the pair (k, (None, Some(w))) if no elements in this have key k.
Hash-partitions the resulting RDD using the existing partitioner/ parallelism level.Example :
val frdd1 = sc.parallelize(Seq(("Spark",35),("Hive",23),("Spark",45),("HBase",89))) val frdd2 = sc.parallelize(Seq(("Spark",74),("Flume",12),("Hive",14),("Kafka",25))) val fullouterjoinrdd = frdd1.fullOuterJoin(frdd2) fullouterjoinrdd.collect
Output :
Array[(String, (Option[Int], Option[Int]))] = Array((Spark,(Some(35),Some(74))), (Spark,(Some(45),Some(74))), (Kafka,(None,Some(25))), (Flume,(None,Some(12))), (Hive,(Some(23),Some(14))), (HBase,(Some(89),None)))For more transformation refer to Transformation and action in Apache Spark.
-
-
AuthorPosts
- You must be logged in to reply to this topic.