Explain the action collect()

Viewing 0 reply threads
  • Author
    Posts
    • #4709
      DataFlair TeamDataFlair Team
      Spectator

      > collect() returns all the elements from an RDD.
      > collect() returns all the elements from an RDD so card must be taken while using it that all of your data must be fit in single machine.
      > Hence it is recommended to use collect() for development purpose or in Unit testing.

      Ex.
      val rdd1 = sc.parallelize(List(10,20,30,40))
      rdd1.collect()


      Output :
      Array[Int] = Array(10,20,30,40)

      From :
      http://data-flair.training/blogs/rdd-transformations-actions-apis-apache-spark/#32_Collect

      It returns all the data / elements present in an RDD in the form of array. It prints values of array back to console and used in debugging programs.

Viewing 0 reply threads
  • You must be logged in to reply to this topic.