Explain fullOuterJoin() operation in Apache Spark.

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 2:21 pm #5134
  
  DataFlair Team
  Spectator
  
  Explain fullOuterJoin() operation in Apache Spark.
- September 20, 2018 at 2:22 pm #5139
  DataFlair Team
  Spectator
  > It is transformation.
  > It’s in package org.apache.spark.rdd.PairRDDFunctions
  
  def fullOuterJoin[W](other: RDD[(K, W)]): RDD[(K, (Option[V], Option[W]))]
  
  Perform a full outer join of this and other.
  For each element (k, v) in this, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for w in other,
  or the pair (k, (Some(v), None)) if no elements in other have key k.
  Similarly, for each element (k, w) in other, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for v in this,
  or the pair (k, (None, Some(w))) if no elements in this have key k.
  Hash-partitions the resulting RDD using the existing partitioner/ parallelism level.
  
  Example :
```
val frdd1 = sc.parallelize(Seq(("Spark",35),("Hive",23),("Spark",45),("HBase",89)))
val frdd2 = sc.parallelize(Seq(("Spark",74),("Flume",12),("Hive",14),("Kafka",25)))
val fullouterjoinrdd = frdd1.fullOuterJoin(frdd2)
fullouterjoinrdd.collect
```
  Output :
  Array[(String, (Option[Int], Option[Int]))] = Array((Spark,(Some(35),Some(74))), (Spark,(Some(45),Some(74))), (Kafka,(None,Some(25))), (Flume,(None,Some(12))), (Hive,(Some(23),Some(14))), (HBase,(Some(89),None)))
  
  For more transformation refer to Transformation and action in Apache Spark.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

Explain fullOuterJoin() operation in Apache Spark.

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses