What is a Dstream?

Viewing 2 reply threads
  • Author
    Posts
    • #5348
      DataFlair TeamDataFlair Team
      Spectator

      What is a Dstream?

    • #5351
      DataFlair TeamDataFlair Team
      Spectator

      A Discretized Stream (DStream), it’s the fundamental abstraction in Spark Streaming, is a continuous sequence of RDDs of constant kind representing a steady/nonstop stream of information. DStreams may be created from live data like information/data from TCP sockets, Kafka, Flume, etc employing a StreamingContext or it may be generated by working on existing DStreams exploitation functions like map, window, and reduceByKeyAndWindow. Periodically DStream create an RDD which is generated by a parent DStream.

      This category contains the fundamental operations offered on all DStreams, like map, filter, and window. additionally, PairDStreamFunctions contains operations offered solely on DStreams of key-value pairs, like groupByKeyAndWindow and be a part of. Through implicit conversions, these operations are offered on any DStream of pairs (e.g., DStream[(Int, Int)]. DStreams internally is characterized by basic properties: – a listing of alternative DStreams depends on – An amount at that the DStream generates an RDD – operate that’s want to generate an RDD once on every occasion interval

      Discretized Stream may be a sequence of Resilient Distributed Databases that represent a stream of information. DStreams may be created from varied sources like Apache Kafka, HDFS, and Apache Flume

      for detail on Spark Streaming refer to Spark Streaming

    • #5352
      DataFlair TeamDataFlair Team
      Spectator

      > As Spark core is build on the concept of RDDsSpark Streaming provides an abstraction called DStreams ordiscretized streams .
      > DStream is a sequence of data arriving over time.
      > Each DStream is represented as a sequence of RDDs arriving at repeated / configured time steps.
      > DStream can created from various input sources like TCP Sockets, Kafka, FlumeHDFS etc.
      > DStream offer two types of operation : transformationwhich generators another DStream and output operationswhich writes the data to external system.
      > One can perform the basic operation of RDDs over the DStream in addition to the new operation related to time like sliding window since DStream derived from the RDDs.

      for furthur study on Spark Streaming DStream ReferSpark Streaming-DStream

Viewing 2 reply threads
  • You must be logged in to reply to this topic.