What is Checkpoint Node and how it works in Hadoop? What is Checkpoint Node used

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop What is Checkpoint Node and how it works in Hadoop? What is Checkpoint Node used

Viewing 2 reply threads
  • Author
    Posts
    • #6271
      DataFlair TeamDataFlair Team
      Spectator

      What is Checkpoint Node and how it works in Hadoop?
      What is Checkpoint Node used for?

    • #6272
      DataFlair TeamDataFlair Team
      Spectator

      Name node , master node of HDFS, contains all the metadata information. It is important to keep this meta data information up to date.

      When any changes are made on the data, corresponding metadata has to be changed. But during run time, name node cannot directly do these changes on the fly, instead these are stored in log files. During next start up, all the previous changes are collected from the logs(edits file) and applied to the existing metadata (fsimage file) and the result is uploaded back in Hadoop .
      If there are huge logs it takes time to apply these changes and make it ready for next start up. Check point node solves this problem.

      Checkpoint node helps this process of applying any changes to metadata during run time itself. It periodically checks for any logs apply the same to the existing state of metadata and uploads the result to name node.

      Secondary name node also does the same job, but the result is not sent back to name node.

    • #6273
      DataFlair TeamDataFlair Team
      Spectator

       Checkpoint node in Hadoop is a new implementation of the Secondary NameNode to solve the drawbacks of Secondary NameNode.
       Main function : create periodic checkpoints of file system metadata by merging edits file with fsimage file. Usually the new fsimage from merge operation is called as a checkpoint.
       Checkpoint Node periodically downloads the fsimage and edits log files from primary NameNode and merges them locally and stores in a directory structure which is similar to the directory structure of a primary NameNode.
       so that primary NameNode can easily access the latest checkpoint if necessary in case of any NameNode failures.
       It usually runs on a different machine than the primary NameNode since its memory requirements are same as the primary NameNode.
       The advantage over the Secondary NameNode is, it also uploads the resulted fsimage from merge operation back to the Active NameNode.
       Current Hadoop release allows multiple Checkpoint Nodes registered with NameNode.

      Checkpoint Node can be started by
      $hdfs namenode –checkpoint

      below are the two important configuration parameters that controls the checkpoint process on Checkpoint Node.

      dfs.namenode.checkpoint.period = 1 hour by default.

       It is the maximum delay between two consecutive checkpoints

      dfs.namenode.checkpoint.txns = 1 million by default.

       It is the maximum number of un-checkpointed transactions in edits file on the NameNode. Once the count of transaction reached this limit, it forces an urgent checkpoint, even if the checkpoint period has not been reached.

      If NameNode is failed, then the latest checkpoint created by Checkpoint Node can be imported to NameNode’s metadata directory.

      Procedure for Importing Checkpoint
      1. A new empty directory needs to be created on NameNode with the name same as the name present in dfs.namenode.name.dir configuration variable.
      2. dfs.namenode.checkpoint.dir configuration variable needs to be updated with the directory location of the latest checkpoint on Checkpoint Node.

      3. Start NameNode with checkpoint Option as mentioned below.
      $ hdfs namenode –importCheckpoint

       NameNode will start copying the checkpoint from dfs.namenode.checkpoint.dir directory on Checkpoint Node to NameNode’s directory dfs.namenode.name.dir.
      Note: Before Checkpoint Import process, the NameNode directory should be empty. (no valid fsimage file on NameNode) otherwise import process will fail.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.