What is the difference between DSM and RDD?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Spark What is the difference between DSM and RDD?

Viewing 1 reply thread
  • Author
    Posts
    • #6414
      DataFlair TeamDataFlair Team
      Spectator

      Differentiate between DSM and RDD?

    • #6415
      DataFlair TeamDataFlair Team
      Spectator

      On the basis of several features, the difference between RDD and DSM is:

      i. Read

      RDD – The read operation in RDD is either coarse-grained or fine-grained. Coarse-grained meaning we can transform the whole dataset but not an individual element on the dataset. While fine-grained means we can transform individual element on the dataset.
      DSM – The read operation in Distributed shared memory is fine-grained.

      ii. Write

      RDD – The write operation in RDD is coarse-grained.
      DSM – The Write operation is fine grained in distributed shared system.

      iii. Consistency

      RDD – The consistency of RDD is trivial meaning it is immutable in nature. We can not realtor the content of RDD i.e. any changes on RDD is permanent. Hence, The level of consistency is very high.
      DSM – The system guarantees that if the programmer follows the rules, the memory will be consistent. Also, the results of memory operations will be predictable.

      iv. Fault-Recovery Mechanism

      RDD – By using lineage graph at any moment, the lost data can be easily recovered in Spark RDD. Therefore, for each transformation, new RDD is formed. As RDDs are immutable in nature, hence, it is easy to recover.
      DSM – Fault tolerance is achieved by a checkpointing technique which allows applications to roll back to a recent checkpoint rather than restarting.

      v. Straggler Mitigation

      Stragglers, in general, are those that take more time to complete than their peers. This could happen due to many reasons such as load imbalance, I/O blocks, garbage collections, etc.
      An issue with the stragglers is that when the parallel computation is followed by synchronizations such as reductions that causes all the parallel tasks to wait for others.

      RDD – It is possible to mitigate stragglers by using backup task, in RDDs.
      DSM – To achieve straggler mitigation, is quite difficult.

      vi. Behavior if not enough RAM

      RDD – As there is not enough space to store RDD in RAM, therefore, the RDDs are shifted to disk.
      DSM – If the RAM runs out of storage, the performance decreases, in this type of systems.

      To read more about the differences, follow the link: RDD vs DSM (Distributed Shared Memory)

Viewing 1 reply thread
  • You must be logged in to reply to this topic.