Explain NameNode High Availability in HDFS?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop Explain NameNode High Availability in HDFS?

Viewing 1 reply thread
  • Author
    Posts
    • #6091
      DataFlair TeamDataFlair Team
      Spectator

      What do you mean by the High Availability of a NameNode in Hadoop?

    • #6093
      DataFlair TeamDataFlair Team
      Spectator

      In earlier version of Hadoop there is a single point of failure problem for name node.
      But this has been overcome in the second version of hadoop by maintaining one active namenode and one standby namenode.

      Standby namenode has all the metadata information required to recover from fail over.

      Datanodes send both the Block information and heartbeats to both the active as well as secondary namenode.

      Active namenode writes all the metadata information like edit logs and namespace change to a shared location (Journal nodes). Whenever an edit is made to this shared location, standby namenode get to know that some change has been made and it immediately reads from that location. So that both active & stand by name node are in sync.
      Only the active name node can write to the shared location.

      Each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover should be triggered.

      If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating that it should become the next active and transition occurs.

      For more detail follow: NameNode High Availability in Hadoop

Viewing 1 reply thread
  • You must be logged in to reply to this topic.