What is Checkpoint Node in Hadoop?

Viewing 2 reply threads
  • Author
    Posts
    • #5854
      DataFlair TeamDataFlair Team
      Spectator

      What is Checkpoint Node used for?
      What is Checkpoint Node and how it works in Hadoop?
      What are roles and responsibilities of Checkpoint Node in Apache Hadoop?

    • #5857
      DataFlair TeamDataFlair Team
      Spectator

      Checkpointing in HDFS plays a vital role . Checkpointing is basically a process which involves merging the fsimage along with the latest edit log and creating a new fsimage for the namenode to possess the latest configured metadata of HDFS namespace .

      Now one can say this task can be performed by a Secondary Namenode or a Standby Namenode as well .

      Well here lies a small difference:-

      The Secondary namenode performs the mapping of the fsimage and the edit log transactions periodically stores them in a shared storage location in case of HA enabled HDFS Cluster.

      But in case of a Checkpoint node , it has the ability to transfer the latest built fsimage to the Active NameNode via HTTP Get call .

      This is the prime advantage of a Checkpoint node over a secondary namenode .

      Now a checkpointing procedure depends upon two basic things:-

      1> Enough elapsed time since last checkpoint(or last created fsimage)
      2> Enough number of accumulated edit log transactions or changes

      If either one of the above conditions gets satisfied, the checkpoint node will be activated .

      Now the mode of operation of a checkpoint node depends on whether one has a HA enabled cluster or a non-HA cluster.

      1> HA-enabled cluster :-

      In this case both the namenode and the checkpoint node will share a common storage . Primarily the checkpoint node(or an advanced standby namenode) will periodically store the latest fsimage file and replaying the fsimage file .As soon as the two conditions for the checkpoint node to get activated are satisfied, it creates the latest fsimage along with a MD5 file just to ensure the authenticity of the fsimage file. It then transfers the fsimage file to the active namenode via HTTP port using Get command .
      On the other end in active namenode, the latest fsimage file is accepted and a similar MD5 file is created at namenode’ s end as well . On receiving the intermediate fsimage file, the namenode saves it with an intermediate filename and after completion it provides a standard fsimage name for the new file .

      2> HA-disabled cluster :-
      In this case, apart from the above steps the secondary name node will get the old fsimage file and the latest edit log transactions (transaction Ids) from the namenode . Here as there is no shared storage, so the secondary namenode will fetch these data from the active namenode .

    • #5860
      DataFlair TeamDataFlair Team
      Spectator

      Checkpoint node is the implementation of secondary name node. It fetches periodically the fs image and edit log from the name node and merge them locally. The resulting states called as a checkpoint, and upload the new merged fs image to name node.
      The mode of operation of a checkpoint node depends on an HA-enabled cluster or a non-HA cluster.

      1> HA-enabled cluster :-

      Both the name node and the checkpoint node will share a common storage. Primarily the checkpoint node(or an advanced standby namenode) will periodically store the latest fsimage file and replay the fsimage file. But as the two conditions for the checkpoint node are satisfied, it creates the latest fsimage along with an MD5 file just to ensure the authenticity of the fsimage file. It then transfers the fsimage file to the active namenode via HTTP port using Get command.
      On the other end in active namenode, the latest fsimage file is accepted and a similar MD5 file is created at namenode’ s end as well. On receiving the intermediate fsimage file, the namenode saves it with an intermediate filename and after completion it provides a standard fsimage name for the new file.

      2> HA-disabled cluster :-
      In this case, the secondaryname node will get the old fsimage file and the latest edit log transactions (transaction Ids) from the namenode. Here as there is no shared storage, so the secondary namenode will fetch these data from the active namenode.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.