Comparison between Secondary NameNode and Checkpoint Node in Hadoop?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop Comparison between Secondary NameNode and Checkpoint Node in Hadoop?

Viewing 2 reply threads
  • Author
    Posts
    • #5257
      DataFlair TeamDataFlair Team
      Spectator

      What is difference between Secondary NameNode and Checkpoint Node in Apache Hadoop?
      Compare Secondary NameNode vs Checkpoint Node in Hadoop?

    • #5258
      DataFlair TeamDataFlair Team
      Spectator

      Secondary NameNode by its name seems that it is a backup node, but in reality it is not. Let us first study about Namenode.
      NameNode stores Metadata . Two files associated with metadata are FsImage and EditLogs.

      FsImage is an “Image file”. It contains the entire filesystem namespace and stored as a file in the namenode’s local file system. It stores inode details like modification time, access time etc.

      EditLogs contains all the recent modifications made to the file system about the most recent FsImage.
      In HDFS, when NameNode starts, first it reads HDFS state from an image file, fsimage. After that it applies edits from the edits log file. NameNode then writes new HDFS state to the fsimage. And then starts normal operation with an empty edits file. At the time of startup, NameNode merge fsimage and edits files. So the edit log file could get very large over time on a busy cluster. Due to larger edits file, the next restart of Namenode takes longer.

      Secondary NameNode solves this issue. It first downloads the FsImage and EditLogs from the NameNode. Then, it merges EditLogs with the Fsimage periodically. Secondary NameNode stores the modified FsImage into persistent storage. So, we can use FsImage in case of NameNode failure. But it does not upload the merged FsImage with EditLogs to active namenode.

      Checkpoint Node in Hadoop first downloads FsImage and edits from the active NameNode. Then it merges them (FsImage and edits) periodically. At last, it uploads the new image back to the active namenode.

    • #5261
      DataFlair TeamDataFlair Team
      Spectator

       

      NameNode stores metadata in the form of two files- fsimage and editlogs, Fsimage is an image file which contains the entire filesystem namespace and mapping of blocks to files. All the modifications made to the file system is written to the Editlogs.

      Once the Namenode is started, it reads the HDFS state from the fsimage file and then it writes the new state to fsimage file and operation were started with an empty edits file. The changes are just written to edits and not merged to fsimage during the runtime and if the NameNode runs for a while edits gets huge and the next startup will take even longer because more changes have to be applied to the state to determine the last state of the metadata.

      The Checkpoint Node fetches periodically fsimage and edits from the NameNode and merges them. The resulting state is called checkpoint. After this is uploads the result to the NameNode.

      Secondary Namenode also fetches periodically fsimage and edits from the NameNode and merges them but it also stores in a persistent storage and it does not upload the merged fsimage with editlogs to namenode.

       

Viewing 2 reply threads
  • You must be logged in to reply to this topic.