What does RDD stand for in Apache spark?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 3:06 pm #5417
  
  DataFlair Team
  Spectator
  
  What does it mean by RDD
- September 20, 2018 at 3:06 pm #5419
  
  DataFlair Team
  Spectator
  
  RDD: Resilient Distributed Datasets, it is a collection of records with distributed computing, which are fault tolerant, immutable in nature. Its lazy feature makes the spark operation to work at greater speed.
  
  RDD has two operations: A. Transformation B. Actions.
  
  Transformation: When a transformation applied on RDD a new RDD is created. and this RDD doesn’t compute unless action is performed on it. Some examples of transformations are map(), flatmap(), filter() .
  
  Action: The operation on the RDD is called as action. Once the action is performed on the RDD, memory allocation and computing of the RDD initialize.Some examples of actions are count, top(), savetofile()
  Each transformation creation or update is noted by the Spark, a term called as lineage graph, which helps to recreate the RDD in the case of fault tolerance.
  
  For more details refer: Apache Spark RDD
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.