What is RDD lineage graph or linage operation in Apache Spark

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Spark What is RDD lineage graph or linage operation in Apache Spark

Viewing 2 reply threads
  • Author
    Posts
    • #5317
      DataFlair TeamDataFlair Team
      Spectator

      Explain lineage graph operator in Apache Spark, how it enables fault-tolerance in Spark ?

    • #5319
      DataFlair TeamDataFlair Team
      Spectator

      The RDD Lineage Graph or RDD operator graph could be a graph of the entire parent RDDs of an RDD. It’s engineered as a result of materializing transformations to the RDD and then creating a logical execution set up.

      The RDDs in Apache Spark rely on one or a lot of alternative RDDs. The illustration of dependencies in between RDDs is understood because of the lineage graph. Lineage graph info is employed to cipher every RDD on demand, so whenever a district of persistent RDD is lost, {the data | the info | the info} that’s lost will be recovered using the lineage graph information.

      For details on RDD DAG refer to Directed Acyclic Graph

    • #5321
      DataFlair TeamDataFlair Team
      Spectator
        <li style=”list-style-type: none”>
        Lineage Graph is the graph of all the parent RDDs for an

      RDD

        <li style=”list-style-type: none”>
        .

      • By applying a different transformation on an RDDresults in lineage graph.
      • When one derives the new RDD from existing (previous) RDD using transformation, Spark keeps the track of all the dependencies between RDD is called lineage graph.
      • Lineage Graph is useful for scenarios mentioned below:

      (1) When there is a demand for computing the new RDD.
      (2) To recover the lost data if part of persisted RDD is lost.

        <li style=”list-style-type: none”>
      • In other words, Lineage Graph is a graph of all transformation operation that needs to execute when an action operation is called.

      For complete information on DAG cite on Directed Acyclic Graph in Apache Spark

Viewing 2 reply threads
  • You must be logged in to reply to this topic.