What does RDD stand for in Apache spark?

Viewing 1 reply thread
  • Author
    Posts
    • #5417
      DataFlair TeamDataFlair Team
      Spectator

      What does it mean by RDD

    • #5419
      DataFlair TeamDataFlair Team
      Spectator

      RDD: Resilient Distributed Datasets, it is a collection of records with distributed computing, which are fault tolerant, immutable in nature. Its lazy feature makes the spark operation to work at greater speed.

      RDD has two operations: A. Transformation B. Actions.

      Transformation: When a transformation applied on RDD a new RDD is created. and this RDD doesn’t compute unless action is performed on it. Some examples of transformations are map(), flatmap(), filter() .

      Action: The operation on the RDD is called as action. Once the action is performed on the RDD, memory allocation and computing of the RDD initialize.Some examples of actions are count, top(), savetofile()
      Each transformation creation or update is noted by the Spark, a term called as lineage graph, which helps to recreate the RDD in the case of fault tolerance.

      For more details refer: Apache Spark RDD

Viewing 1 reply thread
  • You must be logged in to reply to this topic.