Explain Spark streaming

Viewing 1 reply thread
  • Author
    Posts
    • #6426
      DataFlair Team
      Moderator

      Explain Spark streaming

    • #6427
      DataFlair Team
      Moderator

      Spark Streaming
      A data stream defines as a data arriving continuously in the form of an unbounded sequence. For further processing, Streaming separates continuously flowing input data into discrete units. It is a low latency processing and analyzing of streaming data.

      In the year 2013, Apache Spark Streaming was added to Apache Spark. Through Streaming, we can do fault-tolerant,scalable stream processing of live data streams. From many sources like Kafka, Apache Flume, Amazon Kinesis or TCP sockets, Data ingestion can be possible. Also, by using complex algorithms, processing is possible. That are expressed with high-level functions such as map, reduce, join and window. By the end, processed data can be pushed out to filesystems, databases and live dashboards.

      Internally, By Spark streaming, Live input data streams are received and divided into batches. Afterwards, these batches are then processed by the Spark engine to generate the final stream of results in batches.

      Discretized Stream or, in short, a Spark DStream is its basic abstraction. That also represents a stream of data divided into small batches. DStreams are built on Spark RDDs, Spark’s core data abstraction. Streaming can aslo integrate with any other Apache Spark components like Spark MLlib and Spark SQL.

      For more information on Spark Streaming, follow the link: Spark Streaming Tutorial for Beginners

Viewing 1 reply thread
  • You must be logged in to reply to this topic.