Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Spark › What are the advantages of datasets in Apache Spark?
- This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 9:40 pm #6393DataFlair TeamSpectator
List the benefits of Dataset in Apache Spark?
Describe features of Dataset in Apache Spark. -
September 20, 2018 at 9:40 pm #6394DataFlair TeamSpectator
There are lots of benefits using RDD(Resilient Distributed Dataset).
They are Fault-tolerant in nature.
Distributed storage and computation
Data caching in memory/disk for better performance
It performs Lazy evaluation: Better resource utilization, so overall better performance at the cluster level.
Multiple language support(Java, Scala, Python), even we can use two or more language in a project simultaneously.
Rich set of the library: So much simpler to perform a general operation (like grouping, average, map, reduce, filter etc).The developer has to write very fewer lines of code.
The concept of DataFrame: SQL experts can use the data for analysis purpose.
Dataset immutability
Dataset/DataFrame shows error at runtime while Spark SQL queries shows error on run time only
Catalyst optimizer: It generates optimized logical and physical plan.Upgrade your knowledge of RDD with the help of this link: Spark RDD – Introduction, Features & Operations of RDD
-
-
AuthorPosts
- You must be logged in to reply to this topic.