How does Apache Spark handles accumulated Meta data?
-
-
Is there any way to handle accumulated Metadata in Apache Spark?
-
Metadata accumulates on the driver as consequence of shuffle operations. It becomes particularly tedious during long-running jobs.
To deal with the issue of accumulating metadata, there are two options:
- First, set the spark.cleaner.ttl parameter to trigger automatic cleanups. However, this will vanish any persisted RDDs.
- The other solution is to simply split long-running jobs into batches and write intermediate results to disk. This facilitates a fresh environment for every batch and don’t have to worry about metadata build-up.
- You must be logged in to reply to this topic.