How does Apache Spark handles accumulated Meta data?

This topic has 1 reply, 1 voice, and was last updated 7 years, 10 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 4:43 pm #5920
  
  DataFlair Team
  Spectator
  
  Is there any way to handle accumulated Metadata in Apache Spark?
- September 20, 2018 at 4:44 pm #5921
  DataFlair Team
  Spectator
  Metadata accumulates on the driver as consequence of shuffle operations. It becomes particularly tedious during long-running jobs.
  To deal with the issue of accumulating metadata, there are two options:
  - First, set the spark.cleaner.ttl parameter to trigger automatic cleanups. However, this will vanish any persisted RDDs.
  - The other solution is to simply split long-running jobs into batches and write intermediate results to disk. This facilitates a fresh environment for every batch and don’t have to worry about metadata build-up.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.