Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Spark › What is a Dstream?
- This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 2:55 pm #5348DataFlair TeamSpectator
What is a Dstream?
-
September 20, 2018 at 2:56 pm #5351DataFlair TeamSpectator
A Discretized Stream (DStream), it’s the fundamental abstraction in Spark Streaming, is a continuous sequence of RDDs of constant kind representing a steady/nonstop stream of information. DStreams may be created from live data like information/data from TCP sockets, Kafka, Flume, etc employing a StreamingContext or it may be generated by working on existing DStreams exploitation functions like map, window, and reduceByKeyAndWindow. Periodically DStream create an RDD which is generated by a parent DStream.
This category contains the fundamental operations offered on all DStreams, like map, filter, and window. additionally, PairDStreamFunctions contains operations offered solely on DStreams of key-value pairs, like groupByKeyAndWindow and be a part of. Through implicit conversions, these operations are offered on any DStream of pairs (e.g., DStream[(Int, Int)]. DStreams internally is characterized by basic properties: – a listing of alternative DStreams depends on – An amount at that the DStream generates an RDD – operate that’s want to generate an RDD once on every occasion interval
Discretized Stream may be a sequence of Resilient Distributed Databases that represent a stream of information. DStreams may be created from varied sources like Apache Kafka, HDFS, and Apache Flume
for detail on Spark Streaming refer to Spark Streaming
-
September 20, 2018 at 2:56 pm #5352DataFlair TeamSpectator
> As Spark core is build on the concept of RDDs, Spark Streaming provides an abstraction called DStreams ordiscretized streams .
> DStream is a sequence of data arriving over time.
> Each DStream is represented as a sequence of RDDs arriving at repeated / configured time steps.
> DStream can created from various input sources like TCP Sockets, Kafka, Flume, HDFS etc.
> DStream offer two types of operation : transformationwhich generators another DStream and output operationswhich writes the data to external system.
> One can perform the basic operation of RDDs over the DStream in addition to the new operation related to time like sliding window since DStream derived from the RDDs.for furthur study on Spark Streaming DStream ReferSpark Streaming-DStream
-
-
AuthorPosts
- You must be logged in to reply to this topic.