What is Dataset in spark?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 9:47 pm #6408
  
  DataFlair Team
  Spectator
  
  Define Dataset in Apache Spark.
- September 20, 2018 at 9:47 pm #6409
  
  DataFlair Team
  Spectator
  
  A Dataset is an immutable collection of objects, those are mapped to a relational schema. They are strongly-typed in nature.
  There is an encoder, at the core of the Dataset API. That Encoder is responsible for converting between JVM objects and
  tabular representation. By using Spark’s internal binary format, the tabular representation is stored that allows to carry out operations on serialized data and improves memory utilization. It also supports automatically generating encoders for a wide variety of types, including primitive types (e.g. String, Integer, Long) and Scala case classes. It offers many functional transformations (e.g. map, flatMap, filter).
  
  To learn complete insights about DataSets, go through link: Complete Introduction to Apache Spark Dataset
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.