Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Spark › What are the benefits of lazy evaluation in RDD in Apache Spark?
- This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 4:45 pm #5924DataFlair TeamSpectator
why lazy evaluation came into picture in Apache Spark?
discuss the benefits of lazy evaluation in Apache Spark. -
September 20, 2018 at 4:45 pm #5925DataFlair TeamSpectator
Lazy evaluation means that Spark does not evaluate each transformation as they arrive, but instead queues them together and evaluate all at once, as an Action is called.
The benefit of this approach is that Spark can make optimization decisions after it had a chance to look at the DAG in entirety. This would not be possible if it were to execute everything as soon as it got it. As a result, a large volume of Network I/O can be avoided which, otherwise, could have caused a serious bottleneck.
Example:
Suppose we have a file words.txt containing the following lines:line1 word1 line2 word2 word1 line3 word3 word4 line4 word1
Next, we apply the following operations.
scala> val lines = sc.textFile("words.txt") scala> val filtered = lines.filter(line => line.contains("word1")) scala> filtered.first() res0: String = line1 word1
If Spark were to evaluate each line immediately, it would end up reading the whole file, then applying a filter transformation and then displaying the first line from the filtered result. This would mean a lot of extra work and unnecessary memory utilization.
On the other hand, in Lazy evaluation mode, Spark first builds the entire DAG and then, using optimization techniques it understands that reading the entire file is not necessary. The same result can be achieved by just reading the first line of the file.
Please read Lazy evaluation in Spark for more detail .
-
-
AuthorPosts
- You must be logged in to reply to this topic.