What is DStream in Spark streaming?

This topic has 0 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 0 reply threads

Author

Posts
- September 20, 2018 at 9:59 pm #6428
  
  DataFlair Team
  Spectator
  
  To understand the DStream better, let’s begin with a brief introduction on Spark Streaming.
  
  Introduction on Spark Streaming
  Spark Streaming was added to Apache Spark in 2013, an extension of the core Spark API that provides scalable, high-throughput and fault-tolerant stream processing of live data streams. From many sources like Kafka, Apache Flume, Amazon Kinesis or TCP sockets, Data ingestion can be done. Also, by using complex algorithms that are expressed with high-level functions, processing can be done. High-level functions such as map, reduce, join and window. Ultimately, processed data can be pushed out to filesystems, databases as well as live dashboards.
  
  Discretized Stream or, in short, a Spark DStream is key abstraction is Apache Spark. That represents a stream of data divided into small batches. DStreams are built on Spark RDDs, Spark’s core data abstraction. It also allows Streaming in Spark to integrate with any other Apache Spark components like Spark SQL and Spark MLlib.
  
  DStream
  As discussed earlier, Spark DStream (Discretized Stream) is the basic abstraction of Spark Streaming. It is a continuous stream of data. From various sources like Kafka, Flume, Kinesis, or TCP sockets, it receives input. Also, it can be a data stream generated by transforming the input stream. DStream is a continuous stream of RDD (Spark abstraction), at its core. From the certain interval, every RDD in DStream contains data.
  
  If we apply any operation on a DStream, it applies to all the underlying RDDs. DStream covers all the details. It provides the developer with a high-level API for convenience. As a result, Spark DStream facilitates working with streaming data.
  
  To know more about DStream, go through link: Apache Spark DStream (Discretized Streams)
Author

Posts

Viewing 0 reply threads

You must be logged in to reply to this topic.

What is DStream in Spark streaming?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses