> repartition() is a transformation.
> This function changes the number of partitions mentioned in parameter numPartitions(numPartitions : Int)
> It’s in package org.apache.spark.rdd.ShuffledRDD
def repartition(numPartitions: Int)(implicit ord: Ordering[(K, C)] = null): RDD[(K, C)]
Return a new RDD that has exactly numPartitions partitions.
Can increase or decrease the level of parallelism in this RDD. Internally, this uses a shuffle to redistribute data.
If you are decreasing the number of partitions in this RDD, consider using coalesce, which can avoid performing a shuffle.