What are the advantages of datasets in Apache Spark?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Spark What are the advantages of datasets in Apache Spark?

Viewing 1 reply thread
  • Author
    Posts
    • #6393
      DataFlair TeamDataFlair Team
      Spectator

      List the benefits of Dataset in Apache Spark?
      Describe features of Dataset in Apache Spark.

    • #6394
      DataFlair TeamDataFlair Team
      Spectator

      There are lots of benefits using RDD(Resilient Distributed Dataset).
      They are Fault-tolerant in nature.
      Distributed storage and computation
      Data caching in memory/disk for better performance
      It performs Lazy evaluation: Better resource utilization, so overall better performance at the cluster level.
      Multiple language support(Java, Scala, Python), even we can use two or more language in a project simultaneously.
      Rich set of the library: So much simpler to perform a general operation (like grouping, average, map, reduce, filter etc).The developer has to write very fewer lines of code.
      The concept of DataFrame: SQL experts can use the data for analysis purpose.
      Dataset immutability
      Dataset/DataFrame shows error at runtime while Spark SQL queries shows error on run time only
      Catalyst optimizer: It generates optimized logical and physical plan.

      Upgrade your knowledge of RDD with the help of this link: Spark RDD – Introduction, Features & Operations of RDD

Viewing 1 reply thread
  • You must be logged in to reply to this topic.