Live instructor-led & Self-paced Online Certification Training Courses (Big Data, Hadoop, Spark) › Forums › Hadoop › what happens if the block on HDFS is corrupted?
September 20, 2018 at 3:17 pm #5473
If one of the block is corrupted due to hardware failure/disk failure or some unknown reason and the map reduce job has been triggered. How this particular block will be handled? And what happens to the MapReduce Job?September 20, 2018 at 3:17 pm #5474
1. HDFS fsck (For Data Node)
2. Namenode -recover
Fsck is an offline process which examines on-disk structures and usually offers to fix them if they are damaged.
HDFS has its own fsck command, which you can access by running “hdfs fsck.” HDFS fsck determines which files contain corrupt blocks, and gives you options about how to fix them.
HDFS fsck command operates only data, not on metadata.This difference is irrelevant on the local filesystem because data and metadata are stored in the same place. However, for HDFS, metadata is stored on the NameNode, whereas data is stored on the DataNodes.
When properly configured, HDFS is robust, because it stores multiple copies of everything. The administrator has the capability to recover a partial or corrupted edit log. This new functionality is called manual NameNode recovery.
Similar to fsck, NameNode recovery is an offline process. An administrator can run NameNode recovery to recover a corrupted edit log.
Sart the NameNode with the -recover flag to activate recovery mode, like:
./bin/hadoop namenode -recover
Though Manual Recovery is the Best Choice. If there is another valid copy of the edit log somewhere else, it is preferable to use that copy rather than trying to recover the corrupted copy. This is a case where High availability can help a lot. If there is a standby NameNode ready to take over, there should be no need to recover the edit log on the primary.When there is no other copy of the edit log available, Manual recovery is a good choice.
The best recovery process is the one that you never need to do. High availability, combined with edit log failover, should mean that manual recovery is almost never necessary. However, it’s good to know that HDFS has tools to deal with whatever comes up.
For more details, please follow: HDFS Fault ToleranceSeptember 20, 2018 at 3:17 pm #5475
A block that is no longer available due to corruption or machine failure can be replicated from its alternate locations to other
live machines to bring the replication factor back to the normal level.September 20, 2018 at 3:17 pm #5476
HDFS stores replicas of blocks,it can “heal” corrupted blocks by copying one of the good replicas to produce a new,uncorrupt replica.The way this work is that if a client detects an error when reading a block,it reports the bad block and the data node it was trying to read from to the namenode before throwing any exception. The namenode marks the bock replica as a corrupt so it doesn’t direct any more clients to it or try to copy this replica to another datanode.it then schedules a copy of the block to be replicated on another datanode,so its replication factor is back at the expected level. Once this has happened,the corrupt replica is deleted.
You must be logged in to reply to this topic.