Describe HDFS federation?

This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 3:24 pm #5496
  
  DataFlair Team
  Spectator
  
  What is Hadoop Federation?
  Why HDFS Federation came into existence?
  What is Hadoop HDFS Federation?
- September 20, 2018 at 3:24 pm #5499
  
  DataFlair Team
  Spectator
  
  In simple words, HDFS Federation is a way to enhance the current HDFS architecture. It provides a clear separation between the namespace and storage layer of the existing HDFS architecture. The two parts and their primary operations are as below:
  
  1. Namespace – This layer manages files, directories and blocks. This layer stores the metadata and supports the basic file system operations e.g. listing/creation/modification/deletion of files and folders.
  2. Block Storage – This layer is further divided in two parts –
  
  Block Management – This manages the datanodes in the cluster and provides file system operations/replication management.
  Physical Storage – This stores the blocks and provides access for read or write operations.
  To understand clearly, lets first analyse the existing HDFS architecture and its challenges:
  
  In the current architecture there is only one namespace in a single namenode which manages a cluster of datanodes. This architecture works well for small cluster size, however with the increase in cluster size there are lot of challenges with this model. The challenges/limitations are as follows:
  
  1. Tightly coupled Block Storage and Namespace- Due to this tight coupling, it makes difficult for other services to interact and utilize the block storage efficiently.
  
  2. Namespace Scalability- The cluster scales horizontally by adding more datanodes, however its not possible to scale namenode horizontally. However we can scale namenode vertically, but huge metadata of large cluster of datanodes makes it difficult to even scale vertcally in a single namenode machine.
  
  3. Performance- The current file system operations are limited to the throughput of a single name node which at present supports 60000 concurrent tasks.
  
  4. Isolation- In general the HDFS deployments are available on a multi-tenant environment where a single cluster is shared by multiple organizations. In this setup a separate namespace is not possible for one application or one organization.
  
  HDFS Fedration :
  
  To solve these chanllenges HDFS Federation came into the picture. HDFS Fedration helped the namenode scale horizontally. It uses several namenodes or namespaces which are independent of each other. These independent namenodes are federated i.e. they don’t require inter coordination.
  
  Each datanode is registered with all the namenodes in the cluster.
  
  Follow the link to learn more about HDFS Federation in Hadoop
- September 20, 2018 at 3:24 pm #5500
  
  DataFlair Team
  Spectator
  
  Hadoop 1.0 HDFS architecture:
  Two layers –
  1) Namespace- It manages files/directories and blocks.
  2) Block Storage- This layer has two parts –
  
  Block Management- This manages the datanodes in the cluster and provides operations like creation, deletion, modification and search. It also takes care of the replication management.
  Physical Storage -This stores the blocks and provides access for read or write operations.
  Need for HDFS Federation in Hadoop
  1) Limited namespace availability: Keeps all metadata in RAM, created overhead on memory
  2) Decreased metadata operation performance: Since it performs all metadata opeartions
  3) Lack of isolation: All metadata available at one single point.
  4) NameNode Scalability option: Since for every block NN stores some amount of data, more blocks means more overhead on NN memory
  
  What is Hadoop Federation
  
  1) HDFS Federation enhances Hadoop 1.0 HDFS architecture. It provides a clear separation between namespace and storage thus enables scalability and isolation at the cluster level.
  
  2) Hadoop federation separates the namespace layer and storage layer.
  3) It has multiple independent Namenodes each with namespace layer and storage layer.
  4) The NameNodes in Hadoop Federation do not talk to each other.
  5) Each namespace manage only particular slice of data.
  6) Datanodes on the other hand can store blocks managed by any namenode.
  7) Since there are multiple namespaces and namenodes, the end user can use any of them to create their own view of HDFS.
  
  The failure of Namenode still becomes the single point of failure (SPOF) which gives motivation to the introduction of Hadoop 2.0 High Availability feature
  1) Hadoop 2.0 overcomes SPOF problem by introducing an extra NameNode (Passive Standby NameNode) to the Hadoop Architecture which is configured for automatic failover thus providing Hadoop 2.0 High Availability feature.
  2) Hadoop 2.0 High Availability project is designed to render availability to big data applications 24/7 by deploying 2 Hadoop NameNodes –One in active configuration and the other is the Standby Node in passive configuration.
  
  Follow the link to learn more about HDFS Federation in Hadoop
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

Describe HDFS federation?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses