What are the abstractions of Apache Spark?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Spark What are the abstractions of Apache Spark?

Viewing 1 reply thread
  • Author
    Posts
    • #6384
      DataFlair TeamDataFlair Team
      Spectator

      list abstractions of Apache Spark.
      How many abstractions are provided by Apache Spark?

    • #6385
      DataFlair TeamDataFlair Team
      Spectator

      RDD is the core abstraction in Apache Spark. It is an immutable, fault-tolerant
      distributed collection of statically typed objects that are usually stored in-memory. RDD API offers simple operations such as map, reduce, and filter that can be composed in arbitrary ways.

      DataFrame abstraction is built on top of RDD and it adds “named” columns. So, a Spark DataFrame has rows of named columns similar to relational database tables and DataFrames in R and Python (pandas).

      Apart from RDD as well as DataFrame, there are some more specialized data abstractions that work on top of these abstractions. For example, Streaming APIs. These are introduced to process real-time streaming data from various sources such as Flume and Kafka. These APIs work together to provide data engineers with a unified, continuous DataFrame abstraction that can be used for interactive and batch queries. GraphFrame is one more example of specialized data abstraction. It enables developers to analyze social networks. Also, other graphs alongside Excel-like two-dimensional data.

      learn more about Spark’s core abstraction, follow the link,

      1. Spark RDD

      2. DataFrame

Viewing 1 reply thread
  • You must be logged in to reply to this topic.