{"id":1924,"date":"2017-04-08T17:15:06","date_gmt":"2017-04-08T17:15:06","guid":{"rendered":"http:\/\/data-flair.training\/blogs\/?p=1924"},"modified":"2018-11-21T11:41:31","modified_gmt":"2018-11-21T06:11:31","slug":"dag-in-apache-spark","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/dag-in-apache-spark\/","title":{"rendered":"Directed Acyclic Graph DAG in Apache Spark"},"content":{"rendered":"<h2>1. Objective<\/h2>\n<p>In this Apache Spark tutorial, we will understand what is DAG in Apache Spark, what is DAG Scheduler, what is the need of directed acyclic graph in Spark, how to create DAG in Spark and how it helps in achieving fault tolerance. We will also learn how DAG works in RDD, the advantages of DAG in Spark which creates <a href=\"http:\/\/data-flair.training\/blogs\/comparison-between-apache-spark-vs-hadoop-mapreduce\/\">the difference between Apache Spark and Hadoop MapReduce.<\/a><\/p>\n<p><strong>(Directed Acyclic Graph) DAG<\/strong> in <a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-tutorial\/\"><strong>Apache Spark<\/strong> <\/a>is a set of <strong>Vertices<\/strong> and <strong>Edges<\/strong>, where <em>vertices<\/em> represent the <strong>RDDs<\/strong> and the <em>edges<\/em> represent the <strong>Operation to be applied on RDD<\/strong>. In Spark DAG, every edge directs from earlier to later in the sequence. On the calling of <em>Action<\/em>, the created DAG submits to <strong>DAG Scheduler<\/strong> which further splits the graph into the<strong> stages <\/strong>of the<strong> task<\/strong>.<\/p>\n<div id=\"attachment_43073\" style=\"width: 1210px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/dag-visualization.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-43073\" class=\"size-full wp-image-43073\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/dag-visualization.jpg\" alt=\"Directed Acyclic Graph DAG in Apache Spark\" width=\"1200\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/dag-visualization.jpg 1200w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/dag-visualization-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/dag-visualization-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/dag-visualization-768x402.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/dag-visualization-1024x536.jpg 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/dag-visualization-520x272.jpg 520w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><p id=\"caption-attachment-43073\" class=\"wp-caption-text\">Directed Acyclic Graph DAG in Apache Spark<\/p><\/div>\n<h2>2. What is DAG in Apache Spark?<\/h2>\n<p><strong>DAG<\/strong>\u00a0a finite direct graph with no directed cycles. There are finitely many <em>vertices<\/em> and <em>edges,<\/em> where each edge directed from one vertex to another. It contains a sequence of vertices such that every edge is directed from earlier to later in the sequence. It is a strict generalization of <a href=\"http:\/\/data-flair.training\/blogs\/hadoop-mapreduce-introduction-tutorial-comprehensive-guide\/\"><strong>MapReduce<\/strong><\/a> model. DAG operations can do better global optimization than other systems like MapReduce. The picture of DAG becomes clear in more complex jobs.<\/p>\n<p>Apache Spark DAG allows the user to dive into the stage and expand on detail on any stage. In the <em>stage view,<\/em> the details of all <a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-rdd-tutorial\/\"><strong>RDDs<\/strong> <\/a>belonging to that stage are expanded. The Scheduler splits the Spark RDD into <strong>stages<\/strong> based on various transformation applied. (You can refer this link to <a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-rdd-transformations-actions\/\">learn RDD <\/a><\/p>\n<p><a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-rdd-transformations-actions\/\">Transformations and Actions in detail<\/a>) Each stage is comprised of <strong>tasks<\/strong>, based on the partitions of the RDD, which will perform same computation in parallel. <em>The graph here refers to navigation, and directed and acyclic refers to how it is done.<\/em><\/p>\n<h2>3. Need of Directed Acyclic Graph in Spark<\/h2>\n<p>The <a href=\"http:\/\/data-flair.training\/blogs\/limitations-of-hadoop\/\">limitations of Hadoop<\/a> MapReduce became a key point to introduce DAG in Spark. The computation through MapReduce in three steps:<\/p>\n<ul>\n<li>The data is <a href=\"http:\/\/data-flair.training\/blogs\/hdfs-data-read-operation\/\">read from HDFS<\/a>.<\/li>\n<li>Then apply Map and Reduce operations.<\/li>\n<li>The computed result is <a href=\"http:\/\/data-flair.training\/blogs\/hdfs-data-write-operation\/\">written back to HDFS.<\/a><\/li>\n<\/ul>\n<p>Each MapReduce operation is independent of each other and <a href=\"http:\/\/data-flair.training\/blogs\/hadoop-tutorial-for-beginners\/\"><strong>HADOOP<\/strong><\/a> has no idea of which Map reduce would come next. Sometimes for some iteration, it is irrelevant to read and write back the immediate result between two map-reduce jobs. In such case, the memory in stable storage <a href=\"http:\/\/data-flair.training\/blogs\/comprehensive-hdfs-guide-introduction-architecture-data-read-write-tutorial\/\"><strong>(HDFS<\/strong><\/a>) or disk memory gets wasted.<\/p>\n<p>In multiple-step, till the completion of the previous job all the jobs block from the beginning. As a result, complex computation can require a long time with small data volume.<\/p>\n<p>While in Spark, a DAG (Directed Acyclic Graph) of consecutive computation stages is formed. In this way, we optimize the execution plan, e.g. to minimize shuffling data around. In contrast, it is done manually in MapReduce by tuning each MapReduce step.<\/p>\n<h2>4. How DAG works in Spark?<\/h2>\n<ul>\n<li>The interpreter is the first layer, using a Scala interpreter, Spark interprets the code with some modifications.<\/li>\n<li>Spark creates an operator graph when you enter your code in Spark console.<\/li>\n<li>When we call an\u00a0<strong>Action<\/strong>\u00a0on Spark RDD at a high level, Spark submits the operator graph to the <strong>DAG Scheduler.<\/strong><\/li>\n<li>Divide the operators into <strong>stages<\/strong> of the task in the DAG Scheduler. A stage contains task based on the partition of the input data. The DAG scheduler pipelines operators together. For example, map operators schedule in a single stage.<\/li>\n<li>The stages pass on to the <strong>Task Scheduler<\/strong>. It launches task through <a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-cluster-managers-tutorial\/\"><strong>cluster manager<\/strong><\/a>. The dependencies of stages are unknown to the task scheduler.<\/li>\n<li>The <strong>Workers<\/strong> execute the task on the slave.<\/li>\n<\/ul>\n<p>The image below briefly describes the steps of How DAG works in the Spark job execution.<\/p>\n<div id=\"attachment_2980\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/06\/internals-of-job-execution-in-apache-spark.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2980\" class=\"wp-image-2980 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/06\/internals-of-job-execution-in-apache-spark.jpg\" alt=\"An Introduction to Job execution flow in Apache Spark\" width=\"1280\" height=\"976\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/06\/internals-of-job-execution-in-apache-spark.jpg 1280w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/06\/internals-of-job-execution-in-apache-spark-150x114.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/06\/internals-of-job-execution-in-apache-spark-300x229.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/06\/internals-of-job-execution-in-apache-spark-768x586.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/06\/internals-of-job-execution-in-apache-spark-1024x781.jpg 1024w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/a><p id=\"caption-attachment-2980\" class=\"wp-caption-text\">An Introduction to Job execution flow in Apache Spark<\/p><\/div>\n<p>At higher level, we can apply two type of RDD transformations: <strong>narrow transformation <\/strong>(e.g. map(), filter() etc.) and <strong>wide transformation <\/strong>(e.g. reduceByKey()).\u00a0<em>Narrow transformation<\/em> does not require the shuffling of data across a partition, the narrow transformations will group into single stage while in <em>wide transformation<\/em> the data shuffles. Hence, Wide transformation results in stage boundaries.<\/p>\n<p>Each RDD maintains a pointer to one or more parent along with metadata about what type of relationship it has with the parent. For example, if we call <em>val b=a.map()<\/em> on an RDD, the RDD <em>b<\/em>\u00a0keeps a reference to its parent RDD <em>a<\/em>, that&#8217;s an <strong>RDD\u00a0lineage<\/strong>.<\/p>\n<h2>5. How to Achieve Fault Tolerance through DAG?<\/h2>\n<p>RDD splits into the partition and each node operates on a partition at any point in time. Here, the series of <a href=\"http:\/\/data-flair.training\/blogs\/partial-functions-scala-guide\/\"><strong>Scala function\u00a0<\/strong><\/a>executes on a partition of the RDD. These operations compose together and Spark execution engine view these as DAG (Directed Acyclic Graph).<\/p>\n<p>When any node crashes in the middle of any operation say O3 which depends on operation O2, which in turn O1. The\u00a0<em>cluster manager<\/em> finds out the node is dead and assign another node to continue processing. This node will operate on the particular partition of the RDD and the series of operation that it has to execute (O1-&gt;O2-&gt;O3).\u00a0 Now there will be no data loss.<\/p>\n<p>You can refer this link to <a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-streaming-fault-tolerance\/\">learn Fault Tolerance in Apache Spark.<\/a><\/p>\n<h2>6. Working of DAG Optimizer in Spark<\/h2>\n<p>We optimize the DAG in Apache Spark by rearranging and combining operators wherever possible. For, example if we submit a spark job which has a <strong><a href=\"http:\/\/data-flair.training\/blogs\/comparison-between-map-vs-flatmap-operation-spark\/\">map() operation<\/a><\/strong> followed by a <strong>filter operation<\/strong>. The <strong>DAG Optimizer<\/strong> will rearrange the order of these operators since filtering will reduce the number of records to undergo map operation.<\/p>\n<h2>7. Advantages of DAG in Spark<\/h2>\n<p>There are multiple advantages of Spark DAG, let&#8217;s discuss them one by one:<\/p>\n<ul>\n<li>The lost RDD can recover using the Directed Acyclic Graph.<\/li>\n<li>Map Reduce has just two queries the map, and reduce but in DAG we have multiple levels. So to execute SQL query, DAG is more flexible.<\/li>\n<li>DAG helps to achieve fault tolerance. Thus we can recover the lost data.<\/li>\n<li>It can do a better global optimization than a system like Hadoop MapReduce.<\/li>\n<\/ul>\n<h2>8. Conclusion<\/h2>\n<p>DAG in Apache Spark is an alternative to the MapReduce. It is a programming style used in distributed systems. In MapReduce, we just have two functions (map and reduce), while DAG has multiple levels that form a tree structure. Hence, DAG execution is faster than MapReduce because intermediate results does not write to disk.<\/p>\n<p>If in case you have any confusion about DAG in Apache Spark, then feel free to share with us. We will be glad to solve your queries.<br \/>\n<strong>See Also-<\/strong><\/p>\n<ul>\n<li><a href=\"http:\/\/data-flair.training\/blogs\/how-apache-spark-works-run-time-spark-architecture\/\">How does Apache Spark work?<\/a><\/li>\n<li><a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-vs-hadoop-mapreduce\/\">Apache Spark vs. Hadoop Mapreduce<\/a><\/li>\n<\/ul>\n<p>Reference:<br \/>\n<a href=\"http:\/\/spark.apache.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">http:\/\/spark.apache.org\/<\/a><span hidden class=\"__iawmlf-post-loop-links\" data-iawmlf-links=\"[{&quot;id&quot;:2354,&quot;href&quot;:&quot;http:\\\/\\\/spark.apache.org&quot;,&quot;archived_href&quot;:&quot;http:\\\/\\\/web-wp.archive.org\\\/web\\\/20251009215151\\\/https:\\\/\\\/spark.apache.org\\\/&quot;,&quot;redirect_href&quot;:&quot;&quot;,&quot;checks&quot;:[{&quot;date&quot;:&quot;2025-12-11 04:17:36&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-14 07:11:19&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-17 07:55:29&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-20 14:34:27&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-23 15:49:42&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-26 15:59:57&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-30 07:08:03&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-02 07:19:25&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-05 08:37:45&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-08 09:28:47&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-11 11:37:40&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-14 12:46:43&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-17 20:26:14&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-20 20:31:00&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-24 06:20:15&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-27 06:26:56&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-30 07:17:36&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-02 07:26:54&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-05 10:18:07&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-08 12:50:55&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-11 14:05:53&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-14 15:00:31&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-18 00:17:52&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-21 06:52:12&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-24 08:35:32&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-27 08:54:29&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-02 09:01:11&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-05 09:57:46&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-08 12:27:51&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-11 12:42:39&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-14 23:54:40&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-18 03:00:10&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-21 06:08:58&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-24 07:13:58&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-27 09:23:48&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-30 11:37:48&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-02 13:11:14&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-05 14:53:20&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-08 19:36:36&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-11 23:42:38&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-15 01:00:01&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-18 06:16:05&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-21 07:55:15&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-24 09:26:05&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-27 11:00:27&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-30 12:57:25&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-03 13:36:16&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-06 19:54:59&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-10 07:47:43&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-13 09:22:32&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-16 16:11:08&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-19 16:22:48&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-22 17:30:06&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-25 20:07:42&quot;,&quot;http_code&quot;:503},{&quot;date&quot;:&quot;2026-05-29 03:42:28&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-01 10:44:16&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-04 10:50:12&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-07 10:53:31&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-10 11:32:26&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-13 13:26:22&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-17 00:18:56&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-20 03:02:15&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-23 10:46:05&quot;,&quot;http_code&quot;:206}],&quot;broken&quot;:false,&quot;last_checked&quot;:{&quot;date&quot;:&quot;2026-06-23 10:46:05&quot;,&quot;http_code&quot;:206},&quot;process&quot;:&quot;done&quot;}]\"><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>1. Objective In this Apache Spark tutorial, we will understand what is DAG in Apache Spark, what is DAG Scheduler, what is the need of directed acyclic graph in Spark, how to create DAG&#46;&#46;&#46;<\/p>\n","protected":false},"author":6,"featured_media":43073,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[10],"tags":[907,3244,3929],"class_list":["post-1924","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-spark","tag-apache-spark-dag","tag-dag-in-spark","tag-directed-acyclic-graph-in-spark"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Directed Acyclic Graph DAG in Apache Spark - DataFlair<\/title>\n<meta name=\"description\" content=\"DAG in Apache Spark-what is Spark DAG,How DAG works in Spark,DAG scheduler,Spark operator graph,need of Spark DAG,DAG benifit in Spark,DAG optimizer working\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/dag-in-apache-spark\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Directed Acyclic Graph DAG in Apache Spark - DataFlair\" \/>\n<meta property=\"og:description\" content=\"DAG in Apache Spark-what is Spark DAG,How DAG works in Spark,DAG scheduler,Spark operator graph,need of Spark DAG,DAG benifit in Spark,DAG optimizer working\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/dag-in-apache-spark\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2017-04-08T17:15:06+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-11-21T06:11:31+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/dag-visualization.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Directed Acyclic Graph DAG in Apache Spark - DataFlair","description":"DAG in Apache Spark-what is Spark DAG,How DAG works in Spark,DAG scheduler,Spark operator graph,need of Spark DAG,DAG benifit in Spark,DAG optimizer working","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/dag-in-apache-spark\/","og_locale":"en_US","og_type":"article","og_title":"Directed Acyclic Graph DAG in Apache Spark - DataFlair","og_description":"DAG in Apache Spark-what is Spark DAG,How DAG works in Spark,DAG scheduler,Spark operator graph,need of Spark DAG,DAG benifit in Spark,DAG optimizer working","og_url":"https:\/\/data-flair.training\/blogs\/dag-in-apache-spark\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2017-04-08T17:15:06+00:00","article_modified_time":"2018-11-21T06:11:31+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/dag-visualization.jpg","type":"image\/jpeg"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/dag-in-apache-spark\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/dag-in-apache-spark\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89"},"headline":"Directed Acyclic Graph DAG in Apache Spark","datePublished":"2017-04-08T17:15:06+00:00","dateModified":"2018-11-21T06:11:31+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/dag-in-apache-spark\/"},"wordCount":1147,"commentCount":12,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/dag-in-apache-spark\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/dag-visualization.jpg","keywords":["apache spark DAG","DAG in Spark","directed acyclic graph in spark"],"articleSection":["Apache Spark Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/dag-in-apache-spark\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/dag-in-apache-spark\/","url":"https:\/\/data-flair.training\/blogs\/dag-in-apache-spark\/","name":"Directed Acyclic Graph DAG in Apache Spark - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/dag-in-apache-spark\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/dag-in-apache-spark\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/dag-visualization.jpg","datePublished":"2017-04-08T17:15:06+00:00","dateModified":"2018-11-21T06:11:31+00:00","description":"DAG in Apache Spark-what is Spark DAG,How DAG works in Spark,DAG scheduler,Spark operator graph,need of Spark DAG,DAG benifit in Spark,DAG optimizer working","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/dag-in-apache-spark\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/dag-in-apache-spark\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/dag-in-apache-spark\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/dag-visualization.jpg","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/dag-visualization.jpg","width":1200,"height":628,"caption":"Directed Acyclic Graph DAG in Apache Spark"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/dag-in-apache-spark\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"Apache Spark Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/spark\/"},{"@type":"ListItem","position":3,"name":"Directed Acyclic Graph DAG in Apache Spark"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"The DataFlair Team provides industry-driven content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our expert educators focus on delivering value-packed, easy-to-follow resources for tech enthusiasts and professionals.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam2\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/1924","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=1924"}],"version-history":[{"count":6,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/1924\/revisions"}],"predecessor-version":[{"id":43074,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/1924\/revisions\/43074"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/43073"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=1924"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=1924"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=1924"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}