Lazy Evaluation in Apache Spark – A Quick guide

Boost your career with Free Big Data Courses!!

1. Objective

In this Apache Spark lazy evaluation tutorial, we will understand what is lazy evaluation in Apache Spark, How Spark manages the lazy evaluation of Spark RDD data transformation, the reason behind keeping Spark lazy evaluation and what are the advantages of lazy evaluation in Spark transformation.

Lazy Evaluation in Apache Spark – A Quick guide

Lazy Evaluation in Apache Spark – A Quick guide

2. What is Lazy Evaluation in Apache Spark?

Before starting with lazy evaluation in Spark, let us revise Apache Spark concepts.
As the name itself indicates its definition, lazy evaluation in Spark means that the execution will not start until an action is triggered. In Spark, the picture of lazy evaluation comes when Spark transformations occur.

Transformations are lazy in nature meaning when we call some operation in RDD, it does not execute immediately. Spark maintains the record of which operation is being called(Through DAG). We can think Spark RDD as the data, that we built up through transformation. Since transformations are lazy in nature, so we can execute operation any time by calling an action on data. Hence, in lazy evaluation data is not loaded until it is necessary.

Apache Spark Lazy Evaluation Feature.

Apache Spark Lazy Evaluation Explanation.

In MapReduce, much time of developer wastes in minimizing the number of MapReduce passes. It happens by clubbing the operations together. While in Spark we do not create the single execution graph, rather we club many simple operations. Thus it creates the difference between Hadoop MapReduce vs Apache Spark.

In Spark, driver program loads the code to the cluster. When the code executes after every operation, the task will be time and memory consuming. Since each time data goes to the cluster for evaluation.

3. Advantages of Lazy Evaluation in Spark Transformation

There are some benefits of Lazy evaluation in Apache Spark-

a. Increases Manageability

By lazy evaluation, users can organize their Apache Spark program into smaller operations. It reduces the number of passes on data by grouping operations.

b. Saves Computation and increases Speed

Spark Lazy Evaluation plays a key role in saving calculation overhead. Since only necessary values get compute. It saves the trip between driver and cluster, thus speeds up the process.

c. Reduces Complexities

The two main complexities of any operation are time and space complexity. Using Apache Spark lazy evaluation we can overcome both. Since we do not execute every operation, Hence, the time gets saved. It let us work with an infinite data structure. The action is triggered only when the data is required, it reduces overhead.

d. Optimization

It provides optimization by reducing the number of queries. Learn more about Apache Spark Optimization.

4. Conclusion

Hence, Lazy evaluation enhances the power of Apache Spark by reducing the execution time of the RDD operations. It maintains the lineage graph to remember the operations on RDD. As a result, it Optimizes the performance and achieves fault tolerance.
If you like this blog or have any query so please leave a comment.
See Also-

Reference:
http://spark.apache.org/

You give me 15 seconds I promise you best tutorials
Please share your happy experience on Google

follow dataflair on YouTube

2 Responses

  1. Paul A. Gureghian says:

    I am looking for opportunities in Spark

    • Data Flair says:

      Hi Paul,
      Thanks for commenting on Spark Lazy Evalution.
      For Spark Job Opportunities you can follow our
      Job Portal, we provide latest job notifications there.

Leave a Reply

Your email address will not be published. Required fields are marked *