What is a Dstream?

This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 2:55 pm #5348
  
  DataFlair Team
  Spectator
  
  What is a Dstream?
- September 20, 2018 at 2:56 pm #5351
  
  DataFlair Team
  Spectator
  
  A Discretized Stream (DStream), it’s the fundamental abstraction in Spark Streaming, is a continuous sequence of RDDs of constant kind representing a steady/nonstop stream of information. DStreams may be created from live data like information/data from TCP sockets, Kafka, Flume, etc employing a StreamingContext or it may be generated by working on existing DStreams exploitation functions like map, window, and reduceByKeyAndWindow. Periodically DStream create an RDD which is generated by a parent DStream.
  
  This category contains the fundamental operations offered on all DStreams, like map, filter, and window. additionally, PairDStreamFunctions contains operations offered solely on DStreams of key-value pairs, like groupByKeyAndWindow and be a part of. Through implicit conversions, these operations are offered on any DStream of pairs (e.g., DStream[(Int, Int)]. DStreams internally is characterized by basic properties: – a listing of alternative DStreams depends on – An amount at that the DStream generates an RDD – operate that’s want to generate an RDD once on every occasion interval
  
  Discretized Stream may be a sequence of Resilient Distributed Databases that represent a stream of information. DStreams may be created from varied sources like Apache Kafka, HDFS, and Apache Flume
  
  for detail on Spark Streaming refer to Spark Streaming
- September 20, 2018 at 2:56 pm #5352
  
  DataFlair Team
  Spectator
  
  > As Spark core is build on the concept of RDDs, Spark Streaming provides an abstraction called DStreams ordiscretized streams .
  > DStream is a sequence of data arriving over time.
  > Each DStream is represented as a sequence of RDDs arriving at repeated / configured time steps.
  > DStream can created from various input sources like TCP Sockets, Kafka, Flume, HDFS etc.
  > DStream offer two types of operation : transformationwhich generators another DStream and output operationswhich writes the data to external system.
  > One can perform the basic operation of RDDs over the DStream in addition to the new operation related to time like sliding window since DStream derived from the RDDs.
  
  for furthur study on Spark Streaming DStream ReferSpark Streaming-DStream
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

What is a Dstream?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses