How to plan disaster recovery in Hadoop cluster

This topic has 3 replies, 1 voice, and was last updated 5 years, 6 months ago by DataFlair Team.

Viewing 3 reply threads

Author

Posts
- September 20, 2018 at 5:42 pm #6314
  
  DataFlair Team
  Spectator
  
  Cyclone/Flood or any natural calamity, the complete data center is down. For such cases how to plan Disaster recovery?
- September 20, 2018 at 5:43 pm #6318
  
  DataFlair Team
  Spectator
  
  Disaster Recovery in Hadoop cluster refers to the event of recovering all or most of your important data stored on a Hadoop Cluster in case of disasters like hardware failures,data loss ,applications error. There should be minimal or no downtime in cluster.
  
  Disaster can be handled through various techniques :
  
  1) Data loss must be preveneted by writing metadata stored on namenode to a different NFS mount. However High Availability introduced in the latest version of Hadoop is a disaster management technique.
  
  2) HDFS snapshots can also be used in case of recovery.
  
  3) You can enable Trash feature in case of accidental deletion because file deleted first goes to trash folder in HDFS.
  
  4) Hadoop distcp tool can also be used for cluster data copying building a mirror cluster in case of any hardware failure.
- September 20, 2018 at 5:43 pm #6319
  
  DataFlair Team
  Spectator
  
  1) To prevent data loss from data corruption or network failure one can try writing all the metadata stored in the NN to a remote NFS because the metadata is an imp file which stores the info of all Blocks.
  
  2) Replication of blocks
  
  3) HDFS snapshots used for protection from user errors.
  
  4) Enable Trash feature, so if we accidentally delete a file, we can still recover it.
- September 20, 2018 at 5:43 pm #6320
  
  DataFlair Team
  Spectator
  
  Disaster Recovery in Hadoop cluster refers to the recovering of all or most important data in the cluster in the case of disasters like hardware failure, data center loss due to fire, natural disasters, etc., so that there is minimal or no downtime for the
  business.
  
  There are various techniques to handle disaster,
  
  1. Writing Namenode metadata to an external storage such as NFS mount.
  2. Configuring HDFS snapshots
  3. Default replication factor set in HDFS is 3. This protects the cluster against the server or drives failure.
  4. Configure/enable trash feature in HDFS, which helps in protection of data against accidental deletes.
  5. Setting up a DR cluster in other than the primary cluster location. And, using the distcp tool, the data from the primary cluster can be replicated to the DR cluster periodically/dynamically(as soon as data enters into primary), hence creating a mirror cluster for the primary. This helps to use DR cluster in case the data center of the primary cluster is down.
Author

Posts

Viewing 3 reply threads

You must be logged in to reply to this topic.

How to plan disaster recovery in Hadoop cluster

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses