Explain Traits, properties and features of RDDs in Apache Spark

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Spark Explain Traits, properties and features of RDDs in Apache Spark

Viewing 1 reply thread
  • Author
    Posts
    • #4937
      DataFlair TeamDataFlair Team
      Spectator

      Need a brief explanation of RDD in Apache Spark. Why RDD is used to process the data ? What are the major features/characteristics of RDD (Resilient Distributed Datasets) ?

    • #4968
      DataFlair TeamDataFlair Team
      Spectator

      Properties/Traits of RDD:

      • Immutable (Read only cant change or modify): Data is safe to share across processes. It can be created or retrieved anytime which makes caching, sharing & replication easy. It is a way to reach consistency in computations.
      • Partitioned: It is basic unit of parallelism in RDD. Each partition is logical division of data/records.
      • Coarse gained operations: it’s applied to any or all components in datasets through maps or filter or group by operation.
      • Action/TransformationsAll computations in RDDs are actions or transformations.
      • Fault Tolerant: As the name says or include Resilient which means its capability to reconcile, recover or get back all the data (coarse/fine grained & low overhead) using lineage graph.
      • Cacheable: It holds data in persistent storage (memory/disk) so that they can be retrieved more quickly on the next request for them.
      • Persistence: Option of choosing which storage will be used either in-memory or on-disk.

      you can also refer to below blog for more detailed description: Features of RDD.

Viewing 1 reply thread
  • You must be logged in to reply to this topic.