Front Page Forums Apache Spark How does Apache Spark handles accumulated Meta data?

This topic contains 1 reply, has 1 voice, and was last updated by  dfbdteam5 2 months, 3 weeks ago.

Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
    Posts
  • #5920

    dfbdteam5
    Moderator

    Is there any way to handle accumulated Metadata in Apache Spark?

    #5921

    dfbdteam5
    Moderator

    Metadata accumulates on the driver as consequence of shuffle operations. It becomes particularly tedious during long-running jobs.
    To deal with the issue of accumulating metadata, there are two options:

    • First, set the spark.cleaner.ttl parameter to trigger automatic cleanups. However, this will vanish any persisted RDDs.
    • The other solution is to simply split long-running jobs into batches and write intermediate results to disk. This facilitates a fresh environment for every batch and don’t have to worry about metadata build-up.
Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.