Define journaling in Apache Spark.

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 10:03 pm #6435
  
  DataFlair Team
  Spectator
  
  What is write ahead log(journaling)?
- September 20, 2018 at 10:03 pm #6436
  
  DataFlair Team
  Spectator
  
  Write ahead log(journaling)
  
  For suppose any driver node fails, it resulted in all the data that was received and replicated in memory will be lost. It directly affects the result of the stateful transformation. Hence, to avoid this frequent loss of data, Write-ahead logs are introduced in Apache Spark 1.2. That helps to save received data to fault-tolerant storage. All before the data can be processed by Spark Streaming, it is written to write ahead logs.
  
  We use Write ahead logs in the database as well as in file system. It guarantees the durability of any data operations. Internally, It works as at first the intention of the operation is written down in the durable log. Afterwards, the operation is applied to the data. Through this process, even if the system fails in the middle of applying the operation, it is possible to recover lost data easily. It is possible by reading the log and also by reapplying the data it has intended to do.
  
  To learn more about Journaling, follow the link: Spark Streaming write-ahead logs
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.