What is Dataset in spark?

Viewing 1 reply thread
  • Author
    Posts
    • #6408
      DataFlair TeamDataFlair Team
      Spectator

      Define Dataset in Apache Spark.

    • #6409
      DataFlair TeamDataFlair Team
      Spectator

      Dataset is an immutable collection of objects, those are mapped to a relational schema. They are strongly-typed in nature.
      There is an encoder, at the core of the Dataset API. That Encoder is responsible for converting between JVM objects and
      tabular representation. By using Spark’s internal binary format, the tabular representation is stored that allows to carry out operations on serialized data and improves memory utilization. It also supports automatically generating encoders for a wide variety of types, including primitive types (e.g. String, Integer, Long) and Scala case classes. It offers many functional transformations (e.g. map, flatMap, filter).

      To learn complete insights about DataSets, go through link: Complete Introduction to Apache Spark Dataset

Viewing 1 reply thread
  • You must be logged in to reply to this topic.