What are the advantages of datasets in Apache Spark?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 9:40 pm #6393
  
  DataFlair Team
  Spectator
  
  List the benefits of Dataset in Apache Spark?
  Describe features of Dataset in Apache Spark.
- September 20, 2018 at 9:40 pm #6394
  
  DataFlair Team
  Spectator
  
  There are lots of benefits using RDD(Resilient Distributed Dataset).
  They are Fault-tolerant in nature.
  Distributed storage and computation
  Data caching in memory/disk for better performance
  It performs Lazy evaluation: Better resource utilization, so overall better performance at the cluster level.
  Multiple language support(Java, Scala, Python), even we can use two or more language in a project simultaneously.
  Rich set of the library: So much simpler to perform a general operation (like grouping, average, map, reduce, filter etc).The developer has to write very fewer lines of code.
  The concept of DataFrame: SQL experts can use the data for analysis purpose.
  Dataset immutability
  Dataset/DataFrame shows error at runtime while Spark SQL queries shows error on run time only
  Catalyst optimizer: It generates optimized logical and physical plan.
  
  Upgrade your knowledge of RDD with the help of this link: Spark RDD – Introduction, Features & Operations of RDD
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.