How is RDD in Apache Spark different from Distributed Storage Management?

This topic has 1 reply, 1 voice, and was last updated 7 years, 10 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 4:46 pm #5929
  
  DataFlair Team
  Spectator
  
  Differentiate Apache Spark different from Distributed Storage Management.
  Compare Apache Spark different from Distributed Storage Management.
- September 20, 2018 at 4:46 pm #5931
  
  DataFlair Team
  Spectator
  
  Some of the differences between an RDD and Distributed Storage are as follows:
  
  Resilient Distributed Dataset (RDD) is the primary abstraction of data for Apache Sparkframework.
  Distributed Storage is simply a file system which works on multiple nodes.
  
  RDDs store data in-memory (unless explicitly cached).
  Distributed Storage stores data in persistent storage.
  
  RDDs can re-compute itself in the case of failure or data loss.
  If data is lost from the Distributed Storage system it is gone forever (unless there is an internal replication system).
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.