What's Transformation operation, how it processes in Apache Spark ?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Spark What's Transformation operation, how it processes in Apache Spark ?

Viewing 1 reply thread
  • Author
    Posts
    • #4973
      DataFlair TeamDataFlair Team
      Spectator

      how to process data using Transformation operation in Spark ? what is the need of transformations in Spark ? provide the list of all the tranformation available in Spark.

    • #4974
      DataFlair TeamDataFlair Team
      Spectator

      Transformations are lazy evaluated operations on RDD that create one or many new RDDs, e.g. map, filter, reduceByKey, join, cogroup, randomSplit. Transformations are functions which take an RDD as the input and produces one or many RDDs as output. They don’t change the input RDD as RDDs are immutable and hence cannot be changed or modified but always produces new RDD by applying the computations operations on them. By applying transformations you incrementally build an RDD lineage with all the ancestor RDDs of the final RDD(s).

      Transformations are lazy, i.e. are not executed immediately. Transformations can be executed only when actions are called. After executing a transformation, the result RDD(s) will always be different from their ancestors RDD and can be smaller (e.g. filter, distinct, sample), bigger (e.g. flatMap, union, cartesian) or the same size (e.g. map) or it can vary in size.

      RDD allows you to create dependencies b/w RDDs. Dependencies are the steps for producing results in a program. Each RDD in lineage chain, string of dependencies has a function for operating its data and has a pointer dependency to its ancestor RDD. Spark will divide RDD dependencies into stages and tasks and then send those to workers for execution.

      Follow below guide to know more about transformation in Apache Spark:
      http://data-flair.training/blogs/rdd-transformations-actions-apis-apache-spark/

Viewing 1 reply thread
  • You must be logged in to reply to this topic.