How HDFS helps namenode in scaling ?

Viewing 2 reply threads
  • Author
    Posts
    • #4657
      DataFlair TeamDataFlair Team
      Spectator

      What is HDFS Federation?
      How it helps namenode in scaling ?

    • #4658
      DataFlair TeamDataFlair Team
      Spectator

      Before starting HDFS Federation, let us first discuss
      Scalability
      The primary benefit of Hadoop is its Scalability.One can easily scale the cluster by adding more nodes.
      There are two types of Scalability in Hadoop: Vertical and Horizontal
      Vertical scalability
      It is also referred as “scale up”. In vertical scaling, you can increase the hardware capacity of the individual machine. In other words, you can add more RAM or CPU to your existing system to make it more robust and powerful.

      Horizontal scalability 
      It is also referred as “scale out” is basically the addition of more machines or setting up the cluster. In horizontal scaling instead of increasing hardware capacity of individual machine you add more nodes to existing cluster and most importantly, you can add more machines without stopping the system. Therefore we don’t have any downtime or green zone, nothing of such sort while scaling out. So at last to meet your requirements you will have more machines working in parallel.

      To learn more about the Scalabilty follow: HDFS Scalability

      HDFS has two main layers:-

      1. Namespace – manages directories, files and blocks. It supports file system operations such as creation, modification, deletion and listing of files and directories.

      2. Block Storage – Block storage provides operations like creation, deletion, modification and getting the location of the blocks. It also takes care of replica placement and replication.

      Architecture without HDFS Federation

      Datanode can be scaled both vertically & horizontally. But namenode was scaled only vertically not horizontally. Architecture without HDFS Federation has multiple datanodes, but it has only one NameNodefor (one namespace) for all datanodes .This limits the number of blocks, files, and directories supported on the file system.

      To overcome this limitation HDFS Federation is introduced.

      Architecture with HDFS Federation

      In order to scale the namenode horizontally, federation uses multiple independent Namenodes/namespaces. In HDFS Federation, Namenodes does not require coordination with each other as the namenode is independent. And in HDFS federation, all the datanodes are used as common storage for blocks by all the Namenodes. In HDFS Federation, each datanode in registers with all the Namenodes in the cluster. Blocks that belong to a single namespace are called Block pool. Datanodes store blocks for all the block pools in the cluster.
      Therefore, with HDFS Federation previous limitation has been overcome where we were not able to scale namenode horizontally. This also provides the scope for absolute isolation.

    • #4659
      DataFlair TeamDataFlair Team
      Spectator

      HDFS Federation addresses the limitation of the prior HDFS architecture, which allows only a single namespace for the entire cluster, by adding multiple Namenodes/namespaces to HDFS file system. Federated, meaning each Nameservice is independent and does not require coordination with other Nameservices.

      Some details about Federated HDFS –

      1. Small changes to Namenode and most of the changes in Data Nodes, Config, and tools
      2. Namespace and Block Management remain in Name Node
      3. Data Nodes provide storage services for all the name nodes, like periodic heartbeat and block reports to all the name nodes, block received/deleted for a block pool to the corresponding namenode
      4. Balancer works with multiple Namespaces
      5. Name Node can be added/deleted in Federated cluster
      Single configuration for all the nodes in the cluster
      To learn more about the HDFS Federation follow: HDFS FederationTutorial

Viewing 2 reply threads
  • You must be logged in to reply to this topic.