Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Spark › What are the abstractions of Apache Spark?
- This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 9:36 pm #6384DataFlair TeamSpectator
list abstractions of Apache Spark.
How many abstractions are provided by Apache Spark? -
September 20, 2018 at 9:37 pm #6385DataFlair TeamSpectator
RDD is the core abstraction in Apache Spark. It is an immutable, fault-tolerant
distributed collection of statically typed objects that are usually stored in-memory. RDD API offers simple operations such as map, reduce, and filter that can be composed in arbitrary ways.DataFrame abstraction is built on top of RDD and it adds “named” columns. So, a Spark DataFrame has rows of named columns similar to relational database tables and DataFrames in R and Python (pandas).
Apart from RDD as well as DataFrame, there are some more specialized data abstractions that work on top of these abstractions. For example, Streaming APIs. These are introduced to process real-time streaming data from various sources such as Flume and Kafka. These APIs work together to provide data engineers with a unified, continuous DataFrame abstraction that can be used for interactive and batch queries. GraphFrame is one more example of specialized data abstraction. It enables developers to analyze social networks. Also, other graphs alongside Excel-like two-dimensional data.
learn more about Spark’s core abstraction, follow the link,
-
-
AuthorPosts
- You must be logged in to reply to this topic.