Live instructor-led & Self-paced Online Certification Training Courses (Big Data, Hadoop, Spark) › Forums › Apache Spark › What is the difference between DSM and RDD?
September 20, 2018 at 9:49 pm #6414DataFlair TeamModerator
Differentiate between DSM and RDD?
September 20, 2018 at 9:49 pm #6415DataFlair TeamModerator
On the basis of several features, the difference between RDD and DSM is:
RDD – The read operation in RDD is either coarse-grained or fine-grained. Coarse-grained meaning we can transform the whole dataset but not an individual element on the dataset. While fine-grained means we can transform individual element on the dataset.
DSM – The read operation in Distributed shared memory is fine-grained.
RDD – The write operation in RDD is coarse-grained.
DSM – The Write operation is fine grained in distributed shared system.
RDD – The consistency of RDD is trivial meaning it is immutable in nature. We can not realtor the content of RDD i.e. any changes on RDD is permanent. Hence, The level of consistency is very high.
DSM – The system guarantees that if the programmer follows the rules, the memory will be consistent. Also, the results of memory operations will be predictable.
iv. Fault-Recovery Mechanism
RDD – By using lineage graph at any moment, the lost data can be easily recovered in Spark RDD. Therefore, for each transformation, new RDD is formed. As RDDs are immutable in nature, hence, it is easy to recover.
DSM – Fault tolerance is achieved by a checkpointing technique which allows applications to roll back to a recent checkpoint rather than restarting.
v. Straggler Mitigation
Stragglers, in general, are those that take more time to complete than their peers. This could happen due to many reasons such as load imbalance, I/O blocks, garbage collections, etc.
An issue with the stragglers is that when the parallel computation is followed by synchronizations such as reductions that causes all the parallel tasks to wait for others.
RDD – It is possible to mitigate stragglers by using backup task, in RDDs.
DSM – To achieve straggler mitigation, is quite difficult.
vi. Behavior if not enough RAM
RDD – As there is not enough space to store RDD in RAM, therefore, the RDDs are shifted to disk.
DSM – If the RAM runs out of storage, the performance decreases, in this type of systems.
To read more about the differences, follow the link: RDD vs DSM (Distributed Shared Memory)
- You must be logged in to reply to this topic.