How is RDD in Apache Spark different from Distributed Storage Management?

Live instructor-led & Self-paced Online Certification Training Courses (Big Data, Hadoop, Spark) Forums Apache Spark How is RDD in Apache Spark different from Distributed Storage Management?

Viewing 1 reply thread
  • Author
    Posts
    • #5929
      DataFlair Team
      Moderator

      Differentiate Apache Spark different from Distributed Storage Management.
      Compare Apache Spark different from Distributed Storage Management.

    • #5931
      DataFlair Team
      Moderator

      Some of the differences between an RDD and Distributed Storage are as follows:

      Resilient Distributed Dataset (RDD) is the primary abstraction of data for Apache Sparkframework.
      Distributed Storage is simply a file system which works on multiple nodes.

      RDDs store data in-memory (unless explicitly cached).
      Distributed Storage stores data in persistent storage.

      RDDs can re-compute itself in the case of failure or data loss.
      If data is lost from the Distributed Storage system it is gone forever (unless there is an internal replication system).

Viewing 1 reply thread
  • You must be logged in to reply to this topic.