> collect() returns all the elements from an RDD.
> collect() returns all the elements from an RDD so card must be taken while using it that all of your data must be fit in single machine.
> Hence it is recommended to use collect() for development purpose or in Unit testing.
Ex.
val rdd1 = sc.parallelize(List(10,20,30,40))
rdd1.collect()
Output :
Array[Int] = Array(10,20,30,40)
From :
http://data-flair.training/blogs/rdd-transformations-actions-apis-apache-spark/#32_Collect
It returns all the data / elements present in an RDD in the form of array. It prints values of array back to console and used in debugging programs.