What do you mean by the High Availability of a NameNode in Hadoop?

This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 3:55 pm #5671
  
  DataFlair Team
  Spectator
  
  What is Hadoop HDFS NameNode High Availability? How it is achieved?
  How to set Up Hadoop Cluster with HDFS High Availability?
- September 20, 2018 at 3:55 pm #5673
  
  DataFlair Team
  Spectator
  
  Before Hadoop 2.0, NameNode is the only single point of failure. Although secondary NameNode is there but replicating all the namenode metadata and create checkpoints protects against data loss, but it doesn’t provide high availability of the filesystem. On a very large cluster to overtake active namenode with secondary nanemode on failure and start as an active namenode from cold can take 30 minutes or more.
  
  Hadoop 2.0 and later they introduce a new feature High availability. In this implementation, there are a pair of namenodes in an active-standby configuration. In the situation of failure standby namenode overtake the responsibilities of active namenode and start functioning as an active namenode.
  
  It needed certain changes:
  
  1. Shared edit log between the active-standby namenode and the log must be in sync.
  2. Datanodes must send block reports to both namenodes because the block mappings are stored in a namenode’s memory, and not on disk.
  3. Clients must be configured to handle namenode failover, using a mechanism that is transparent to users.
  
  To implement this we have two choices using the Quorum Journal Manager (QJM) or Conventional Shared Storage (NFS).
  
  Follow the link to learn more about High availability in Hadoop
- September 20, 2018 at 3:55 pm #5674
  
  DataFlair Team
  Spectator
  
  High availability of a NameNode can be achieved by configuring the Passive stand by node in the cluster along with the Primary running node.
  
  Passive Node keeps persistent file system name space along with the in memory metadata. So in the case of Forceful or Graceful failure of Primary Node, failover happens on a Passive node without any significant interruption.
  
  To achieve this configuration, multiple architectural configurations are to be performed.
  1. Primary node must use highly available shared storage or QJM to share the edit logs with the stand by the node.
  2. Also, DataNodes must send the block report to both the nodes.
  
  Follow the link to learn more about High availability in Hadoop
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

What do you mean by the High Availability of a NameNode in Hadoop?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses