Explain different transformations in DStream in Apache Spark Streaming.

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Spark Explain different transformations in DStream in Apache Spark Streaming.

Viewing 1 reply thread
  • Author
    Posts
    • #5941
      DataFlair TeamDataFlair Team
      Spectator

      Define the various type of transformation in Apache Spark Streaming.
      Explain kind of transformation in Spark Streaming DStream.

    • #5944
      DataFlair TeamDataFlair Team
      Spectator

      Different transformations in DStream in Apache Spark Streaming are:

      1-map(func) — Return a new DStream by passing each element of the source DStream through a function func.

      2-flatMap(func) — Similar to map, but each input item can be mapped to 0 or more output items.

      3-filter(func) — Return a new DStream by selecting only the records of the source DStream on which func returns true.

      4-repartition(numPartitions) — Changes the level of parallelism in this DStream by creating more or fewer partitions.

      5-union(otherStream) — Return a new DStream that contains the union of the elements in the source DStream and
      otherDStream.

      6-count() — Return a new DStream of single-element RDDs by counting the number of elements in each RDD of the source DStream.

      7-reduce(func)— Return a new DStream of single-element RDDs by aggregating the elements in each RDD of the source DStream using a function func (which takes two arguments and returns one).

      8-countByValue() — When called on a DStream of elements of type K, Return a new DStream of (K, Long) pairs where the value of each key is its frequency in each RDD of the source DStream.

      9-reduceByKey(func, [numTasks])— When called on a DStream of (K, V) pairs, return a new DStream of (K, V) pairs where the values for each key are aggregated using the given reduce function.

      10-join(otherStream, [numTasks]) — When called on two DStreams of (K, V) and (K, W) pairs, return a new DStream of (K, (V, W)) pairs with all pairs of elements for each key.

      11-cogroup(otherStream, [numTasks]) — When called on DStream of (K, V) and (K, W) pairs, return a new DStream of (K, Seq[V], Seq[W]) tuples.

      12-transform(func) — Return a new DStream by applying a RDD-to-RDD function to every RDD of the source DStream.

      13-updateStateByKey(func) — Return a new “state” DStream where the state for each key is updated by applying the given function on the previous state of the key and the new values for the key.

Viewing 1 reply thread
  • You must be logged in to reply to this topic.