What do you mean by the High Availability of a NameNode in Hadoop?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop What do you mean by the High Availability of a NameNode in Hadoop?

Viewing 2 reply threads
  • Author
    Posts
    • #5671
      DataFlair TeamDataFlair Team
      Spectator

      What is Hadoop HDFS NameNode High Availability? How it is achieved?
      How to set Up Hadoop Cluster with HDFS High Availability?

    • #5673
      DataFlair TeamDataFlair Team
      Spectator

      Before Hadoop 2.0, NameNode is the only single point of failure. Although secondary NameNode is there but replicating all the namenode metadata and create checkpoints protects against data loss, but it doesn’t provide high availability of the filesystem. On a very large cluster to overtake active namenode with secondary nanemode on failure and start as an active namenode from cold can take 30 minutes or more.

      Hadoop 2.0 and later they introduce a new feature High availability. In this implementation, there are a pair of namenodes in an active-standby configuration. In the situation of failure standby namenode overtake the responsibilities of active namenode and start functioning as an active namenode.

      It needed certain changes:

      1. Shared edit log between the active-standby namenode and the log must be in sync.
      2. Datanodes must send block reports to both namenodes because the block mappings are stored in a namenode’s memory, and not on disk.
      3. Clients must be configured to handle namenode failover, using a mechanism that is transparent to users.

      To implement this we have two choices using the Quorum Journal Manager (QJM) or Conventional Shared Storage (NFS).

      Follow the link to learn more about High availability in Hadoop

    • #5674
      DataFlair TeamDataFlair Team
      Spectator

      High availability of a NameNode can be achieved by configuring the Passive stand by node in the cluster along with the Primary running node.

      Passive Node keeps persistent file system name space along with the in memory metadata. So in the case of Forceful or Graceful failure of Primary Node, failover happens on a Passive node without any significant interruption.

      To achieve this configuration, multiple architectural configurations are to be performed.
      1. Primary node must use highly available shared storage or QJM to share the edit logs with the stand by the node.
      2. Also, DataNodes must send the block report to both the nodes.

      Follow the link to learn more about High availability in Hadoop

Viewing 2 reply threads
  • You must be logged in to reply to this topic.