Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › How to plan disaster recovery in Hadoop cluster
- This topic has 3 replies, 1 voice, and was last updated 5 years, 6 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 5:42 pm #6314DataFlair TeamSpectator
Cyclone/Flood or any natural calamity, the complete data center is down. For such cases how to plan Disaster recovery?
-
September 20, 2018 at 5:43 pm #6318DataFlair TeamSpectator
Disaster Recovery in Hadoop cluster refers to the event of recovering all or most of your important data stored on a Hadoop Cluster in case of disasters like hardware failures,data loss ,applications error. There should be minimal or no downtime in cluster.
Disaster can be handled through various techniques :
1) Data loss must be preveneted by writing metadata stored on namenode to a different NFS mount. However High Availability introduced in the latest version of Hadoop is a disaster management technique.
2) HDFS snapshots can also be used in case of recovery.
3) You can enable Trash feature in case of accidental deletion because file deleted first goes to trash folder in HDFS.
4) Hadoop distcp tool can also be used for cluster data copying building a mirror cluster in case of any hardware failure.
-
September 20, 2018 at 5:43 pm #6319DataFlair TeamSpectator
1) To prevent data loss from data corruption or network failure one can try writing all the metadata stored in the NN to a remote NFS because the metadata is an imp file which stores the info of all Blocks.
2) Replication of blocks
3) HDFS snapshots used for protection from user errors.
4) Enable Trash feature, so if we accidentally delete a file, we can still recover it.
-
September 20, 2018 at 5:43 pm #6320DataFlair TeamSpectator
Disaster Recovery in Hadoop cluster refers to the recovering of all or most important data in the cluster in the case of disasters like hardware failure, data center loss due to fire, natural disasters, etc., so that there is minimal or no downtime for the
business.There are various techniques to handle disaster,
1. Writing Namenode metadata to an external storage such as NFS mount.
2. Configuring HDFS snapshots
3. Default replication factor set in HDFS is 3. This protects the cluster against the server or drives failure.
4. Configure/enable trash feature in HDFS, which helps in protection of data against accidental deletes.
5. Setting up a DR cluster in other than the primary cluster location. And, using the distcp tool, the data from the primary cluster can be replicated to the DR cluster periodically/dynamically(as soon as data enters into primary), hence creating a mirror cluster for the primary. This helps to use DR cluster in case the data center of the primary cluster is down.
-
-
AuthorPosts
- You must be logged in to reply to this topic.