How does Apache Spark handles accumulated Meta data?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Spark How does Apache Spark handles accumulated Meta data?

Viewing 1 reply thread
  • Author
    Posts
    • #5920
      DataFlair TeamDataFlair Team
      Spectator

      Is there any way to handle accumulated Metadata in Apache Spark?

    • #5921
      DataFlair TeamDataFlair Team
      Spectator

      Metadata accumulates on the driver as consequence of shuffle operations. It becomes particularly tedious during long-running jobs.
      To deal with the issue of accumulating metadata, there are two options:

      • First, set the spark.cleaner.ttl parameter to trigger automatic cleanups. However, this will vanish any persisted RDDs.
      • The other solution is to simply split long-running jobs into batches and write intermediate results to disk. This facilitates a fresh environment for every batch and don’t have to worry about metadata build-up.
Viewing 1 reply thread
  • You must be logged in to reply to this topic.