What are the benefits of lazy evaluation in RDD in Apache Spark?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Spark What are the benefits of lazy evaluation in RDD in Apache Spark?

Viewing 1 reply thread
  • Author
    Posts
    • #5924
      DataFlair TeamDataFlair Team
      Spectator

      why lazy evaluation came into picture in Apache Spark?
      discuss the benefits of lazy evaluation in Apache Spark.

    • #5925
      DataFlair TeamDataFlair Team
      Spectator

      Lazy evaluation means that Spark does not evaluate each transformation as they arrive, but instead queues them together and evaluate all at once, as an Action is called.

      The benefit of this approach is that Spark can make optimization decisions after it had a chance to look at the DAG in entirety. This would not be possible if it were to execute everything as soon as it got it. As a result, a large volume of Network I/O can be avoided which, otherwise, could have caused a serious bottleneck.

      Example:
      Suppose we have a file words.txt containing the following lines:

      line1 word1
      line2 word2 word1
      line3 word3 word4
      line4 word1

      Next, we apply the following operations.

      scala> val lines = sc.textFile("words.txt")
      scala> val filtered = lines.filter(line => line.contains("word1"))
      scala> filtered.first()
      res0: String = line1 word1

      If Spark were to evaluate each line immediately, it would end up reading the whole file, then applying a filter transformation and then displaying the first line from the filtered result. This would mean a lot of extra work and unnecessary memory utilization.

      On the other hand, in Lazy evaluation mode, Spark first builds the entire DAG and then, using optimization techniques it understands that reading the entire file is not necessary. The same result can be achieved by just reading the first line of the file.

      Please read Lazy evaluation in Spark for more detail .

Viewing 1 reply thread
  • You must be logged in to reply to this topic.