Define journaling in Apache Spark.

Viewing 1 reply thread
  • Author
    Posts
    • #6435
      DataFlair TeamDataFlair Team
      Spectator

      What is write ahead log(journaling)?

    • #6436
      DataFlair TeamDataFlair Team
      Spectator

      Write ahead log(journaling)

      For suppose any driver node fails, it resulted in all the data that was received and replicated in memory will be lost. It directly affects the result of the stateful transformation. Hence, to avoid this frequent loss of data, Write-ahead logs are introduced in Apache Spark 1.2. That helps to save received data to fault-tolerant storage. All before the data can be processed by Spark Streaming, it is written to write ahead logs.

      We use Write ahead logs in the database as well as in file system. It guarantees the durability of any data operations. Internally, It works as at first the intention of the operation is written down in the durable log. Afterwards, the operation is applied to the data. Through this process, even if the system fails in the middle of applying the operation, it is possible to recover lost data easily. It is possible by reading the log and also by reapplying the data it has intended to do.

      To learn more about Journaling, follow the link: Spark Streaming write-ahead logs

Viewing 1 reply thread
  • You must be logged in to reply to this topic.