What is SparkSession in Apache Spark?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 3:49 pm #5634
  
  DataFlair Team
  Spectator
  
  What is the need for SparkSession in Spark?
  What are the responsibilities of SparkSession?
- September 20, 2018 at 3:49 pm #5635
  
  DataFlair Team
  Spectator
  
  Starting from Apache Spark 2.0, Spark Session is the new entry point for Spark applications.
  
  Prior to 2.0, SparkContext was the entry point for spark jobs. RDD was one of the main APIs then, and it was created and manipulated using Spark Context. For every other APIs, different contexts were required – For SQL, SQL Context was required; For Streaming, Streaming Context was required; For Hive, Hive Context was required.
  
  But from 2.0, RDD along with DataSet and its subset DataFrame APIs are becoming the standard APIs and are a basic unit of data abstraction in Spark. All of the user defined code will be written and evaluated against the DataSet and DataFrame APIs as well as RDD.
  
  So, there is a need for a new entry point build for handling these new APIs, which is why Spark Session has been introduced. Spark Session also includes all the APIs available in different contexts – Spark Context, SQL Context, Streaming Context, Hive Context.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.