This topic contains 3 replies, has 1 voice, and was last updated by  dfbdteam3 1 year, 6 months ago.

Viewing 4 posts - 1 through 4 (of 4 total)
  • Author
  • #5473


    If one of the block is corrupted due to hardware failure/disk failure or some unknown reason and the map reduce job has been triggered. How this particular block will be handled? And what happens to the MapReduce Job?



    HDFS is a very robust, therefore in order to recover corrupted Data Block it has provided the user with below:-

    1. HDFS fsck (For Data Node)
    2. Namenode -recover


    Fsck is an offline process which examines on-disk structures and usually offers to fix them if they are damaged.

    HDFS has its own fsck command, which you can access by running “hdfs fsck.” HDFS fsck determines which files contain corrupt blocks, and gives you options about how to fix them.

    HDFS fsck command operates only data, not on metadata.This difference is irrelevant on the local filesystem because data and metadata are stored in the same place. However, for HDFS, metadata is stored on the NameNode, whereas data is stored on the DataNodes.

    NameNode :-

    When properly configured, HDFS is robust, because it stores multiple copies of everything. The administrator has the capability to recover a partial or corrupted edit log. This new functionality is called manual NameNode recovery.

    Similar to fsck, NameNode recovery is an offline process. An administrator can run NameNode recovery to recover a corrupted edit log.

    Sart the NameNode with the -recover flag to activate recovery mode, like:

    ./bin/hadoop namenode -recover

    Though Manual Recovery is the Best Choice. If there is another valid copy of the edit log somewhere else, it is preferable to use that copy rather than trying to recover the corrupted copy. This is a case where High availability can help a lot. If there is a standby NameNode ready to take over, there should be no need to recover the edit log on the primary.When there is no other copy of the edit log available, Manual recovery is a good choice.


    The best recovery process is the one that you never need to do. High availability, combined with edit log failover, should mean that manual recovery is almost never necessary. However, it’s good to know that HDFS has tools to deal with whatever comes up.

    For more details, please follow: HDFS Fault Tolerance



    A block that is no longer available due to corruption or machine failure can be replicated from its alternate locations to other
    live machines to bring the replication factor back to the normal level.



    HDFS stores replicas of blocks,it can “heal” corrupted blocks by copying one of the good replicas to produce a new,uncorrupt replica.The way this work is that if a client detects an error when reading a block,it reports the bad block and the data node it was trying to read from to the namenode before throwing any exception. The namenode marks the bock replica as a corrupt so it doesn’t direct any more clients to it or try to copy this replica to another then schedules a copy of the block to be replicated on another datanode,so its replication factor is back at the expected level. Once this has happened,the corrupt replica is deleted.

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic.