What are the benefits of lazy evaluation in RDD in Apache Spark?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 4:45 pm #5924
  
  DataFlair Team
  Spectator
  
  why lazy evaluation came into picture in Apache Spark?
  discuss the benefits of lazy evaluation in Apache Spark.
- September 20, 2018 at 4:45 pm #5925
  DataFlair Team
  Spectator
  Lazy evaluation means that Spark does not evaluate each transformation as they arrive, but instead queues them together and evaluate all at once, as an Action is called.
  
  The benefit of this approach is that Spark can make optimization decisions after it had a chance to look at the DAG in entirety. This would not be possible if it were to execute everything as soon as it got it. As a result, a large volume of Network I/O can be avoided which, otherwise, could have caused a serious bottleneck.
  
  Example:
  Suppose we have a file words.txt containing the following lines:
```
line1 word1
line2 word2 word1
line3 word3 word4
line4 word1
```
  Next, we apply the following operations.
```
scala> val lines = sc.textFile("words.txt")
scala> val filtered = lines.filter(line => line.contains("word1"))
scala> filtered.first()
res0: String = line1 word1
```
  If Spark were to evaluate each line immediately, it would end up reading the whole file, then applying a filter transformation and then displaying the first line from the filtered result. This would mean a lot of extra work and unnecessary memory utilization.
  
  On the other hand, in Lazy evaluation mode, Spark first builds the entire DAG and then, using optimization techniques it understands that reading the entire file is not necessary. The same result can be achieved by just reading the first line of the file.
  
  Please read Lazy evaluation in Spark for more detail .
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

What are the benefits of lazy evaluation in RDD in Apache Spark?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses