Live instructor-led & Self-paced Online Certification Training Courses (Big Data, Hadoop, Spark) Forums Apache Spark Explain different transformations in DStream in Apache Spark Streaming.

This topic contains 1 reply, has 1 voice, and was last updated by  dfbdteam5 9 months, 4 weeks ago.

Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
    Posts
  • #5941

    dfbdteam5
    Moderator

    Define the various type of transformation in Apache Spark Streaming.
    Explain kind of transformation in Spark Streaming DStream.

    #5944

    dfbdteam5
    Moderator

    Different transformations in DStream in Apache Spark Streaming are:

    1-map(func) — Return a new DStream by passing each element of the source DStream through a function func.

    2-flatMap(func) — Similar to map, but each input item can be mapped to 0 or more output items.

    3-filter(func) — Return a new DStream by selecting only the records of the source DStream on which func returns true.

    4-repartition(numPartitions) — Changes the level of parallelism in this DStream by creating more or fewer partitions.

    5-union(otherStream) — Return a new DStream that contains the union of the elements in the source DStream and
    otherDStream.

    6-count() — Return a new DStream of single-element RDDs by counting the number of elements in each RDD of the source DStream.

    7-reduce(func)— Return a new DStream of single-element RDDs by aggregating the elements in each RDD of the source DStream using a function func (which takes two arguments and returns one).

    8-countByValue() — When called on a DStream of elements of type K, Return a new DStream of (K, Long) pairs where the value of each key is its frequency in each RDD of the source DStream.

    9-reduceByKey(func, [numTasks])— When called on a DStream of (K, V) pairs, return a new DStream of (K, V) pairs where the values for each key are aggregated using the given reduce function.

    10-join(otherStream, [numTasks]) — When called on two DStreams of (K, V) and (K, W) pairs, return a new DStream of (K, (V, W)) pairs with all pairs of elements for each key.

    11-cogroup(otherStream, [numTasks]) — When called on DStream of (K, V) and (K, W) pairs, return a new DStream of (K, Seq[V], Seq[W]) tuples.

    12-transform(func) — Return a new DStream by applying a RDD-to-RDD function to every RDD of the source DStream.

    13-updateStateByKey(func) — Return a new “state” DStream where the state for each key is updated by applying the given function on the previous state of the key and the new values for the key.

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.