How to attain fault tolerance in Spark?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 12:17 pm #4797
  
  DataFlair Team
  Spectator
  
  How is fault tolerance achieved in Apache Spark?
  Is Apache Spark fault tolerant? if yes, how?
- September 20, 2018 at 12:17 pm #4798
  
  DataFlair Team
  Spectator
  
  Yes, Apache Spark offers fault tolerance because of its abstraction-RDD. Basically, as its job, it handles the failure of any worker node in the cluster. In this way, it makes sure that the loss of data is reduced to zero.
  
  Basically, data in fault-tolerant file systems like HDFS or S3, Spark operates on it. But for streaming/live data (data over the network), this does not set true. Hence, fault tolerance in Spark is needed. The basic fault-tolerant semantic of Spark are:
  
  As a process, every Spark RDD remembers the lineage of the input dataset, and it is possible because Apache Spark RDD has a feature of immutability.
  
  Any partition can be re-computed from the original fault-tolerant dataset using the lineage of operations, if due to a worker node failure if any partition of an RDD is lost.
  
  The data in the final transformed RDD will always be the same irrespective of failures in the Spark cluster, assuming that all of the RDD transformations are deterministic.
  
  The achieved data is replicated among multiple Spark executors in worker nodes in the cluster, to achieve fault tolerance for all the generated RDDs. It came up as of two kinds of data that needs to be recovered in the event of failure:
  
  – Data received and replicated
  – Data received but buffered for replication
  
  However, to learn about Fault tolerance in Apache Spark in detail, follow the link: Fault tolerance in Apache Spark – Reliable Spark Streaming
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

How to attain fault tolerance in Spark?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses