How is RDD in Apache Spark different from Distributed Storage Management?
-
-
Differentiate Apache Spark different from Distributed Storage Management.
Compare Apache Spark different from Distributed Storage Management.
-
Some of the differences between an RDD and Distributed Storage are as follows:
Resilient Distributed Dataset (RDD) is the primary abstraction of data for Apache Sparkframework.
Distributed Storage is simply a file system which works on multiple nodes.
RDDs store data in-memory (unless explicitly cached).
Distributed Storage stores data in persistent storage.
RDDs can re-compute itself in the case of failure or data loss.
If data is lost from the Distributed Storage system it is gone forever (unless there is an internal replication system).
- You must be logged in to reply to this topic.