The NameNode is the single point of failure in Hadoop 1.0.
Each cluster has a single NameNode and if that machine is not available, the whole cluster will be not available.
This impacts the total availability of HDFS in two ways:
For any unplanned event such as machine crashes, the whole cluster is not available until the Name node is brought up manually.
For planned maintenance such as Hardware or Software upgrades on NameNode would result in cluster unavailability.
In Hadoop 2.0, HDFS High Availability feature addresses the above problem, by providing an option to run two NameNodes in the same cluster in an Active/Passive configuration with a hot standby.
This allows fast Failover to a new NameNode for any machine crashes or administrator initiated fail-over for any planned maintenance activities.
1) What is the Single point of failure in Hadoop v1?
The single point of failure in Hadoop v1 is NameNode. If NameNode gets fail the whole Hadoop cluster will not work. Actually, there will not any data loss only the cluster work will be shut down, because NameNode is only the point of contact to all DataNodes and if the NameNode fails all communication will stop.
2) What are the available solutions to handle single point of failure in Hadoop 1?
To handle the single point of failure, we can use another setup configuration which can backup NameNode metadata. If the primary NameNode will fail our setup can switch to secondary (backup) and no any type to shutdown will happen for Hadoop cluster.
How Single point of failure issue has been addressed in Hadoop 2?
HDFS High Availability of Namenode is introduced with Hadoop 2. In this two separate machines are getting configured as NameNodes, where one NameNode always in working state and anther is in standby. Working Name node handling all clients request in the cluster where standby is behaving as the slave and maintaining enough state to provide a fast failover on Working Name node.