What is active and passive NameNode in Hadoop?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop What is active and passive NameNode in Hadoop?

Viewing 3 reply threads
  • Author
    Posts
    • #5824
      DataFlair TeamDataFlair Team
      Spectator

      What is the need of passive NameNode in Hadoop?

    • #5826
      DataFlair TeamDataFlair Team
      Spectator

      NameNode maintains the file system tree and metadata information of data nodes. It writes the information in the local file system in two ways: Namespace image and edit logs. If name node gets failed, the entire system gets fail. We can overcome this failure by NFS Mount or Secondary Name Node. But we have the single point of failure in the name node. Secondary name node merges the Namespace image and edit logs of active name node. In case of failure to activate the Secondary Name Node we require three things:

      All active node information (Secondary name node info) is stored in the memory.
      The edit logs are repayed
      Actively getting the status of data node. These all require more than 30 minutes of time.
      So to increase High Availability in hadoop 2.x we have a name node in stand by configuration. we need to do these changes
      This stand by node will read till the last line of the namespace image (generally can be done by zoo keeper) later if we add in active node, it reads automatically.
      Data node need to send status to both active and stand by node (Passive node).
      clients must be configured to handle the name node failure

      so to increase the high availability feature in hadoop we require passive name node

    • #5828
      DataFlair TeamDataFlair Team
      Spectator

      Prior to Hadoop 2.x, the NameNode was the single point of failure. So if Primary NameNode goes down, complete file system becomes unavailable.

      To overcome this issue, Hadoop 2.x Hadoop has come up with a new feature which addresses the problem of single point of failure by providing the option of running two redundant NameNodes in the same cluster in an Active/Passive stand by Node configuration.

      So Passive Node can act as a Primary Node and facilitate client requests without significant interruption in case of failure or planned downtime activity on Primary/Active Name node.

      To implement this change, few architectural changes are required such as Primary node must use highly available shared storage or QJM to share the edit logs with the stand by the node.

      Also, DataNodes must send the block report to both the nodes(Primary and StandBy Node).

    • #5830
      DataFlair TeamDataFlair Team
      Spectator

      In Hadoop 1.x, name node was a single point of failure(SPOF), So we a name node is down then the Entire Hadoop Cluster will be down, So to overcome this problem Hadoop 1.x was introduced with High Availability of Name Node feature in which there are two name nodes namely
      1) Active
      2) Passive Name Nodes.
      Active Name Node is the one which works and runs in the Cluster and Passive Name node is the one which is a standby name node.
      Data nodes send Block reports to both the Name Node’s so Both Name Nodes have Same data, So when Active Name Node goes down, Passive Name Node replaces it and Manages the Cluster so that the Cluster never goes down or remains unavailable.

Viewing 3 reply threads
  • You must be logged in to reply to this topic.