A Discretized Stream (DStream), it’s the fundamental abstraction in Spark Streaming, is a continuous sequence of RDDs of constant kind representing a steady/nonstop stream of information. DStreams may be created from live data like information/data from TCP sockets, Kafka, Flume, etc employing a StreamingContext or it may be generated by working on existing DStreams exploitation functions like map, window, and reduceByKeyAndWindow. Periodically DStream create an RDD which is generated by a parent DStream.
This category contains the fundamental operations offered on all DStreams, like map, filter, and window. additionally, PairDStreamFunctions contains operations offered solely on DStreams of key-value pairs, like groupByKeyAndWindow and be a part of. Through implicit conversions, these operations are offered on any DStream of pairs (e.g., DStream[(Int, Int)]. DStreams internally is characterized by basic properties: – a listing of alternative DStreams depends on – An amount at that the DStream generates an RDD – operate that’s want to generate an RDD once on every occasion interval
Discretized Stream may be a sequence of Resilient Distributed Databases that represent a stream of information. DStreams may be created from varied sources like Apache Kafka, HDFS, and Apache Flume
> As Spark core is build on the concept of RDDs, Spark Streaming provides an abstraction called DStreams ordiscretized streams .
> DStream is a sequence of data arriving over time.
> Each DStream is represented as a sequence of RDDs arriving at repeated / configured time steps.
> DStream can created from various input sources like TCP Sockets, Kafka, Flume, HDFS etc.
> DStream offer two types of operation : transformationwhich generators another DStream and output operationswhich writes the data to external system.
> One can perform the basic operation of RDDs over the DStream in addition to the new operation related to time like sliding window since DStream derived from the RDDs.