Differentiate Secondary NameNode and Checkpoint Node in Hadoop

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop Differentiate Secondary NameNode and Checkpoint Node in Hadoop

Viewing 3 reply threads
  • Author
    Posts
    • #5705
      DataFlair TeamDataFlair Team
      Spectator

      Comparison between Secondary NameNode and Checkpoint Node in Hadoop?
      What is the difference between Secondary NameNode and Checkpoint Node in Hadoop?

    • #5708
      DataFlair TeamDataFlair Team
      Spectator

      Namenode stores the HDFS metadata in two files in its local filesystem:fsimage and edits file.

      1. Fsimage contains the whole metadata info like namespace image of file system, Block mappings to file, etc.
      2. Edits file contains all the changes made to the file system after the NameNode startup.
      Once the Namenode starts, first it reads/constructs the namespace image of HDFS from fsimage, applies all the changes to it from edits file.
      If the Namenode is running for a long time, it’s next startup will take time as it has to apply large no. of changes via edits file.

      To overcome this, we have,

      Checkpoint Node: It periodically downloads fsimage and edits file from Namenode, merges it to a latest fsimage, and uploads it back to the Namenode.
      Hence, during next startup of Namenode, it will construct it’s metadata into Hadoop HDFS with the latest fsimage, and with an empty edits file.

      Secondary Namenode: It is also same as Checkpoint node, but it doesn’t upload the latest merged fsimage back to namenode, but uploads it to a persistant storage.
      Hence, in case of Namenode failure, this latest fsimage from this storage can be used.

    • #5710
      DataFlair TeamDataFlair Team
      Spectator

      Namenode runs on master and keep matadata of HDFS in memory. Metadata like filename, path location, number of Block,slave related info, etc.
      Metadata is associated with fsimage and edits
      Fsimage- It is the base of metadata, it is the image file. It consists of all the metadata like filename, path location, number of blocks, slave related info, etc.
      Edits- It consist of all the modification made to the file system. It is kind of a log file.

      As the NameNode starts it reads the state of HDFS from the fsimage and then apply the modification made in edits and again write the modified state of HDFS to the fsimage. And the delete data from edits.
      As the name node start it merge the fsimage file and edits log file, and if cluster remains busy(during runtime) the edits log file becomes heavy and take more and more time to get processed
      So this overall effects the restart of next NameNode.

      To overcome this drawback checkpoint node and secondary NameNode were introduced
      Checkpoint- It fetches the fsimage and edits log file from the namenode and merge them periodically. And the upload the new fsimage to active NameNode.
      Secondary NameNode- It also fetches the fsimage and edits log file from the namenode and merge them periodically.But upload functionality is not present in it.

    • #5711
      DataFlair TeamDataFlair Team
      Spectator

      Secondary namenode is responsible for writing editlogs of NameNode in file called fSimage in HDFS. After which the edit logs are cleared. This activity is done periodically which helps minimizing the size of edit log files(since changes are flushed to fsimage on secondary namenode).

      The Checkpoint Node fetches periodically fsimage and edits from the NameNode and merges them.
      The resulting state is called checkpoint. After this is uploads the result to the NameNode.

      The main difference between Secondary and Checkpoint namenode is secondary namenode does not upload the merged Fsimage with editlogs to active namenode
      where as the checkpoint node uploads the merged new image back to active Namenode.
      So the NameNode need to fetch the state from the Secondary NameNode

Viewing 3 reply threads
  • You must be logged in to reply to this topic.