

{"id":6633,"date":"2018-01-27T07:29:31","date_gmt":"2018-01-27T07:29:31","guid":{"rendered":"https:\/\/data-flair.training\/blogs\/?p=6633"},"modified":"2018-09-17T15:46:05","modified_gmt":"2018-09-17T10:16:05","slug":"rdd-lineage","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/rdd-lineage\/","title":{"rendered":"RDD lineage in Spark: ToDebugString Method"},"content":{"rendered":"<h2><span style=\"font-family: Verdana, Geneva, sans-serif\">1.\u00a0<\/span>Objective<\/h2>\n<p><span style=\"font-weight: 400\">Basically, in <strong><a href=\"https:\/\/data-flair.training\/blogs\/apache-spark-for-beginners\/\">Spark<\/a>\u00a0<\/strong>all the dependencies between the RDDs will be logged in a graph, despite the actual data. This is what we call as a lineage graph in Spark. This document holds the concept of RDD lineage in Spark logical execution plan. Moreover, we will get to know that how to get RDD Lineage Graph by the toDebugString method in detail. Before all, let&#8217;s also learn about Spark RDDs.<\/span><\/p>\n<div id=\"attachment_6695\" style=\"width: 1210px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Introduction-to-RDD-Lineage.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-6695\" class=\"wp-image-6695 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Introduction-to-RDD-Lineage.jpg\" alt=\"Spark RDD Lineage - Introduction\" width=\"1200\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Introduction-to-RDD-Lineage.jpg 1200w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Introduction-to-RDD-Lineage-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Introduction-to-RDD-Lineage-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Introduction-to-RDD-Lineage-768x402.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Introduction-to-RDD-Lineage-1024x536.jpg 1024w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><p id=\"caption-attachment-6695\" class=\"wp-caption-text\">Introduction to Spark RDD Lineage<\/p><\/div>\n<h2>2.\u00a0Introduction to Spark RDD<\/h2>\n<p><span style=\"font-weight: 400\"><a href=\"https:\/\/data-flair.training\/blogs\/apache-spark-rdd-tutorial\/\"><strong>Spark RDD<\/strong><\/a> is nothing but an acronym for \u201cResilient Distributed Dataset\u201d. We can consider RDD as a fundamental data structure of Apache Spark. To be very specific, RDD is an immutable collection of objects in Apache Spark. That helps to compute on the different node of the cluster.<\/span><br \/>\nOn decomposing the name of Spark RDD:<\/p>\n<ul>\n<li style=\"font-weight: 400\"><strong>Resilient<\/strong><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">This means <strong><a href=\"https:\/\/data-flair.training\/blogs\/fault-tolerance-in-apache-spark\/\">fault-tolerant<\/a><\/strong>. By using RDD lineage graph(DAG), we can recompute missing or damaged partitions due to node failures.<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><strong>Distributed<\/strong><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">It means data resides on multiple nodes.<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><strong>Dataset <\/strong><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">It is nothing but a record of the data you work with. Also, a user can load the dataset externally. For example, JSON file, CSV file, text file or database via JDBC with no specific data structure.<\/span><\/p>\n<p><strong><a href=\"https:\/\/data-flair.training\/blogs\/apache-spark-dataset-tutorial\/\">You must read the Spark dataSet Tutorial<\/a><\/strong><\/p>\n<h2>3.\u00a0Introduction to RDD Lineage<\/h2>\n<p>Basically, evaluation of RDD is lazy in nature. It means a series of transformations are performed on an RDD, which is not even evaluated immediately.<br \/>\nWhile <a href=\"https:\/\/data-flair.training\/blogs\/create-rdds-in-apache-spark\/\"><strong>we create a new RDD<\/strong><\/a> from an existing Spark RDD, that new RDD also carries a pointer to the parent RDD in Spark. That is the same as all the dependencies between the RDDs\u00a0those are logged in a graph, rather than the actual data.\u00a0It is what we call as lineage graph.<br \/>\nRDD lineage is nothing but the graph of all the parent RDDs of an RDD. We also call it an RDD operator graph or RDD dependency graph. To be very specific, it is an\u00a0output\u00a0of applying transformations to the\u00a0spark. Then, it creates a logical execution plan.<br \/>\nAlso,\u00a0physical execution plan or execution <strong><a href=\"https:\/\/data-flair.training\/blogs\/dag-in-apache-spark\/\">DAG<\/a><\/strong> is known as DAG of stages.<br \/>\nLet&#8217;s start with\u00a0one example of Spark RDD lineage by using Cartesian or zip to understand well. However, we can also use other operators to build an RDD graph in Spark.<br \/>\n<b>For example<\/b><\/p>\n<div id=\"attachment_6637\" style=\"width: 841px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/rdd-lineage.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-6637\" class=\"wp-image-6637 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/rdd-lineage.jpg\" alt=\"Introduction to RDD lineage in Apache Spark\" width=\"831\" height=\"475\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/rdd-lineage.jpg 831w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/rdd-lineage-150x86.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/rdd-lineage-300x171.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/rdd-lineage-768x439.jpg 768w\" sizes=\"auto, (max-width: 831px) 100vw, 831px\" \/><\/a><p id=\"caption-attachment-6637\" class=\"wp-caption-text\">Introduction to RDD lineage in Apache Spark<\/p><\/div>\n<p>Above figure depicts an RDD graph, which is the result of the following series of transformations:<\/p>\n<p><strong><a href=\"https:\/\/data-flair.training\/blogs\/apache-spark-lazy-evaluation\/\">Let us revise Lazy evaluation in Spark<\/a><\/strong><br \/>\nval r00 = sc.parallelize(0 to 9)<br \/>\nval r01 = sc.parallelize(0 to 90 by 10)<br \/>\nval r10 = r00 cartesian df01<br \/>\nval r11 = r00.map(n =&gt; (n, n))<br \/>\nval r12 = r00 zip df01<br \/>\nval r13 = r01.keyBy(_ \/ 20)<br \/>\nval r20 = Seq(r11, r12, r13).foldLeft(r10)(_ union _)<br \/>\nAfter an action has been called, this is a graph of what transformations need to be executed.<br \/>\nIn other words, whenever on the basis of the existing RDDs we create new RDDs, using lineage graph spark manage these dependencies. Basically, along with metadata about what type of relationship it has with the parent RDD, each RDD maintains a pointer to one or more parent.<br \/>\n<strong>For example,<\/strong><br \/>\nif we say, on an<br \/>\nRDD val b=a.map().<br \/>\nHence, RDD b keeps a reference to its parent RDD a. That is a sort of an RDD lineage.<\/p>\n<h2>4. Logical Execution Plan for\u00a0RDD Lineage<\/h2>\n<p>Basically, logical execution plan gets initiated with earliest RDDs. Earliest RDDs are nothing but RDDs which are not dependent on other RDDs.\u00a0To be very specific those are independent of reference cached data.\u00a0Moreover, it ends with the RDD those produces the result of the action which has been called to execute.<br \/>\nWe can also say, it is a DAG that is executed when <a href=\"https:\/\/data-flair.training\/blogs\/learn-apache-spark-sparkcontext\/\"><strong>SparkContext<\/strong> <\/a>is requested to run a Spark job.<\/p>\n<h2>5. ToDebugString Method to get\u00a0RDD Lineage Graph in Spark<\/h2>\n<p><span style=\"font-weight: 400\">Although there are several methods to get RDD lineage graph in spark, one of the methods is toDebugString method. Such as,<\/span><br \/>\n<b>toDebugString: String<\/b><\/p>\n<p><strong><a href=\"https:\/\/data-flair.training\/blogs\/apache-spark-dstream-discretized-streams\/\">Have a look at Spark DStream<\/a><\/strong><br \/>\nBasically, we can learn about an Spark RDD lineage graph with the help of this method.<br \/>\nscala&gt; val wordCount1 = sc.textFile(&#8220;README.md&#8221;).flatMap(_.split(&#8220;\\\\s+&#8221;)).map((_, 1)).reduceByKey(_ + _)<br \/>\nwordCount1: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[21] at reduceByKey at &lt;console&gt;:24<br \/>\nscala&gt; wordCount1.toDebugString<br \/>\nres13: String =<br \/>\n(2) ShuffledRDD[21] at reduceByKey at &lt;console&gt;:24 []<br \/>\n+-(2) MapPartitionsRDD[20] at map at &lt;console&gt;:24 []<br \/>\n| \u00a0MapPartitionsRDD[19] at flatMap at &lt;console&gt;:24 []<br \/>\n| \u00a0README.md MapPartitionsRDD[18] at textFile at &lt;console&gt;:24 []<br \/>\n| \u00a0README.md HadoopRDD[17] at textFile at &lt;console&gt;:24 []<br \/>\nHere for indication of shuffle boundary, this method \u201c toDebugString method\u201d uses indentations.<br \/>\nBasically, here H in round brackets refers, numbers that show the level of parallelism at each stage.<br \/>\n<span style=\"font-weight: 400\">For example, (2) in the above output.<\/span><br \/>\nscala&gt; wordCount1.getNumPartitions<br \/>\nres14: Int = 2<br \/>\nThe toDebugString method is included when executing an action, With spark.logLineage property enabled.<br \/>\n$ .\/bin\/spark-shell &#8211;conf spark.logLineage=true<br \/>\nscala&gt; sc.textFile(&#8220;README.md&#8221;, 4).count<br \/>\n&#8230;<br \/>\n15\/10\/17 14:46:42 INFO SparkContext: Starting job: count at &lt;console&gt;:25<br \/>\n15\/10\/17 14:46:42 INFO SparkContext: RDD&#8217;s recursive dependencies:<br \/>\n(4) MapPartitionsRDD[1] at textFile at &lt;console&gt;:25 []<br \/>\n| \u00a0README.md HadoopRDD[0] at textFile at &lt;console&gt;:25 []<\/p>\n<p><strong><a href=\"https:\/\/data-flair.training\/blogs\/apache-spark-sql-performance-tuning\/\">You must read about Spark Performance Tuning<\/a><\/strong><\/p>\n<p>So, this was all about Spark RDD Lineage Tutorial. Hope you like our explanation.<\/p>\n<h2>6. Conclusion<\/h2>\n<p><span style=\"font-weight: 400\">Hence, by this blog, we have\u00a0learned the actual meaning of Apache Spark RDD lineage graph. Moreover, also we have tasted the flavor of the logical execution plan in Apache Spark. However, we have also seen\u00a0toDebugString method in detail. Therefore,\u00a0we have covered all the concept of\u00a0lineage graph in Apache <a href=\"https:\/\/data-flair.training\/blogs\/apache-spark-rdd-features\/\"><strong>Spark RDD<\/strong><\/a>. <\/span><br \/>\n<span style=\"font-weight: 400\">Furthermore, if you have any query, please ask in the comment section.\u00a0<\/span><br \/>\nRefer\u00a0<strong><a href=\"https:\/\/data-flair.training\/blogs\/best-apache-spark-scala-books\/\">top books to learn Spark<\/a><\/strong>.<br \/>\n<strong><a href=\"https:\/\/spark.apache.org\/\">For reference<\/a><\/strong><span hidden class=\"__iawmlf-post-loop-links\" data-iawmlf-links=\"[{&quot;id&quot;:2052,&quot;href&quot;:&quot;https:\\\/\\\/spark.apache.org&quot;,&quot;archived_href&quot;:&quot;http:\\\/\\\/web-wp.archive.org\\\/web\\\/20251009215151\\\/https:\\\/\\\/spark.apache.org\\\/&quot;,&quot;redirect_href&quot;:&quot;&quot;,&quot;checks&quot;:[{&quot;date&quot;:&quot;2025-12-11 00:11:34&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-14 03:24:05&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-17 05:06:29&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-20 07:19:55&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-23 14:10:46&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-26 19:03:14&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-30 13:05:23&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-02 13:25:12&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-05 14:08:05&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-09 10:16:58&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-12 11:04:53&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-15 17:09:49&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-18 18:39:09&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-21 19:15:09&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-26 04:14:49&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-29 05:32:17&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-01 07:55:30&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-04 10:44:57&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-07 12:28:46&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-11 00:52:17&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-14 12:51:24&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-17 14:17:39&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-20 17:49:34&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-24 04:42:19&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-27 06:25:21&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-02 08:44:49&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-05 10:27:17&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-08 11:13:11&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-11 12:04:06&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-14 12:32:46&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-18 01:16:16&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-21 21:29:48&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-25 06:37:35&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-28 07:59:07&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-31 10:36:07&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-04 11:16:36&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-07 18:11:02&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-11 05:09:37&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-14 06:26:10&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-18 15:58:17&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-22 11:10:25&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-27 06:59:55&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-30 12:38:54&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-03 15:24:36&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-06 17:05:30&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-10 12:07:21&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-14 23:33:58&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-19 11:27:54&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-23 02:59:38&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-29 05:05:46&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-01 06:55:32&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-04 20:59:59&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-08 05:37:55&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-11 15:39:15&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-14 16:52:39&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-18 01:16:02&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-22 04:29:36&quot;,&quot;http_code&quot;:206}],&quot;broken&quot;:false,&quot;last_checked&quot;:{&quot;date&quot;:&quot;2026-06-22 04:29:36&quot;,&quot;http_code&quot;:206},&quot;process&quot;:&quot;done&quot;}]\"><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>1.\u00a0Objective Basically, in Spark\u00a0all the dependencies between the RDDs will be logged in a graph, despite the actual data. This is what we call as a lineage graph in Spark. This document holds the&#46;&#46;&#46;<\/p>\n","protected":false},"author":6,"featured_media":6695,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[10],"tags":[8380,11345,13104,13108,14758],"class_list":["post-6633","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-spark","tag-logical-execution-plan","tag-rdd-lineage-in-spark","tag-spark-rdd","tag-spark-rdd-lineage","tag-todebugstring-method"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>RDD lineage in Spark: ToDebugString Method - DataFlair<\/title>\n<meta name=\"description\" content=\"What is Spark RDD &amp; RDD lineage in Spark,Logical Execution Plan for Spark RDD Lineage,toDebugString Method with syntax and examples,ways to create spark RDD\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/rdd-lineage\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"RDD lineage in Spark: ToDebugString Method - DataFlair\" \/>\n<meta property=\"og:description\" content=\"What is Spark RDD &amp; RDD lineage in Spark,Logical Execution Plan for Spark RDD Lineage,toDebugString Method with syntax and examples,ways to create spark RDD\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/rdd-lineage\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2018-01-27T07:29:31+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-09-17T10:16:05+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Introduction-to-RDD-Lineage.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"RDD lineage in Spark: ToDebugString Method - DataFlair","description":"What is Spark RDD & RDD lineage in Spark,Logical Execution Plan for Spark RDD Lineage,toDebugString Method with syntax and examples,ways to create spark RDD","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/rdd-lineage\/","og_locale":"en_US","og_type":"article","og_title":"RDD lineage in Spark: ToDebugString Method - DataFlair","og_description":"What is Spark RDD & RDD lineage in Spark,Logical Execution Plan for Spark RDD Lineage,toDebugString Method with syntax and examples,ways to create spark RDD","og_url":"https:\/\/data-flair.training\/blogs\/rdd-lineage\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2018-01-27T07:29:31+00:00","article_modified_time":"2018-09-17T10:16:05+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Introduction-to-RDD-Lineage.jpg","type":"image\/jpeg"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/rdd-lineage\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/rdd-lineage\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89"},"headline":"RDD lineage in Spark: ToDebugString Method","datePublished":"2018-01-27T07:29:31+00:00","dateModified":"2018-09-17T10:16:05+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/rdd-lineage\/"},"wordCount":976,"commentCount":4,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/rdd-lineage\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Introduction-to-RDD-Lineage.jpg","keywords":["Logical Execution Plan","RDD lineage in Spark","spark rdd","Spark RDD lineage","toDebugString Method"],"articleSection":["Apache Spark Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/rdd-lineage\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/rdd-lineage\/","url":"https:\/\/data-flair.training\/blogs\/rdd-lineage\/","name":"RDD lineage in Spark: ToDebugString Method - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/rdd-lineage\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/rdd-lineage\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Introduction-to-RDD-Lineage.jpg","datePublished":"2018-01-27T07:29:31+00:00","dateModified":"2018-09-17T10:16:05+00:00","description":"What is Spark RDD & RDD lineage in Spark,Logical Execution Plan for Spark RDD Lineage,toDebugString Method with syntax and examples,ways to create spark RDD","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/rdd-lineage\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/rdd-lineage\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/rdd-lineage\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Introduction-to-RDD-Lineage.jpg","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Introduction-to-RDD-Lineage.jpg","width":1200,"height":628,"caption":"Introduction to Spark RDD Lineage"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/rdd-lineage\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"Apache Spark Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/spark\/"},{"@type":"ListItem","position":3,"name":"RDD lineage in Spark: ToDebugString Method"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"The DataFlair Team provides industry-driven content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our expert educators focus on delivering value-packed, easy-to-follow resources for tech enthusiasts and professionals.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam2\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/6633","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=6633"}],"version-history":[{"count":5,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/6633\/revisions"}],"predecessor-version":[{"id":34370,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/6633\/revisions\/34370"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/6695"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=6633"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=6633"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=6633"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}