In mathematical term, the Directed Acyclic Graph is a graph with cycles which are not directed. DAG is a graph which contains set of all the operations that are applied on RDD. On RDD when any action is called. Spark creates the DAG and submits it to the DAG scheduler. Only after the DAG is built, Spark creates the query optimization plan. The DAG scheduler divides operators into stages of tasks. A stage is comprised of tasks based on partitions of the input data. The DAG scheduler pipelines operators together.
Fault tolerance is achieved in Spark using the Directed Acyclic Graph. The query optimization is possible in Spark by the use of DAG. Thus, we get the better performance by using DAG.
To know about how to create DAG, how is fault tolerance achieved through DAG, Working of DAG optimizer read DAG in Apache Spark