Explain NameNode and DataNode in Hadoop?

Viewing 3 reply threads
  • Author
    Posts
    • #5235
      DataFlair TeamDataFlair Team
      Spectator

      What is the function of NameNode in HDFS?
      What is the role of DataNode in HDFS?

    • #5237
      DataFlair TeamDataFlair Team
      Spectator

      Role of Namenode:
      Namenode is a daemon (background process) that runs on the ‘Master Node’ of Hadoop Cluster.
      Namenode resides on the storage layer component of HDFS (Hadoop distributed file System)

      Functions of Namenode are:

      1. To store all the metadata(data about data) of all the slave nodes in a Hadoop cluster.
      E.g, Filename, Filepath, no. of Blocks, blockid, block location, number of blocks, slave related configurations.
      That is, it knows actually where, what data is stored.
      This metadata is stored in memory for faster retrieval to reduce latency that will be caused due to disk seeks.
      Hence, it’s recommended that MasterNode on which Namenode daemon runs should be a very reliable hardware with high configurations and high RAM.
      And as well a persistent copy of this metadata is stored in disk if machine reboots.

      2. Keep track of all the slave nodes (whether they are alive or dead). This is done using the heartbeat methodology.

      3. Replication (provides High availability, reliability and Fault tolerance): Namenode replicates the data on slavenode to various other slavenodes based on the configured Replication Factor.

      4. Balancing: Namenode balances data replication, i.e., blocks of data should not be under or over replicated. This needs to be manually configured.

      Role of DataNode:

      1. DataNode is a daemon (process that runs in background) that runs on the ‘SlaveNode’ in Hadoop Cluster.

      2. In Hdfs file is broken into small chunks called blocks(default block of 64 MB)

      3. These blocks of data are stored on the slave node.

      4. It stores the actual data. So, large number of disks are required to store data.(Recommended 8 disks).

      5. These data read/write operation to disks is performed by the DataNode. For hosting datanodes, commodity hardware can be used.

    • #5240
      DataFlair TeamDataFlair Team
      Spectator

      Namenode is the background process that runs on the master node on the Hadoop.There is only one namenode in a cluster.It stores the metadata(data about data) about data stored on the slave nodes such address of the Blocks, number of blocks stored, directory structure of any node etc.

      The function of Namenode:

      1) Whenever Client has to do any operation on the datanode, request firstly comes to Namenode then Namenode provides the information about data node and then operation is performed on the datanode

      2) Namenode is responsible for reconstructing the original file back from blocks present on the different datanodes because it contains the metadata of the blocks.

      3) Datanode keeps sending the heartbeat signal to Namenode periodically.In case a datanode on which client is performing some operation fails then Namenode redirects the operation to other nodes which up and running

      4)It instructs the datanode with block copies to copy the data blocks to other datanodes in case a datanode failed. In this way, it maintains the configured replication factor.

      Datanode:

      It is the name of the background process which runs on the slave node.It is responsible for storing and managing the actual data on the slave node.

      The client writes data to one slave node and then it is responsibility of Datanode to replicates data to the slave nodes according to replication factor.

    • #5242
      DataFlair TeamDataFlair Team
      Spectator

      An HDFS cluster has two types of nodes operating in a master−slave pattern:

      1. NameNode (the master) and
      2. Number of DataNodes (slaves/workers).

      HDFS NameNode
      1. NameNode is the main central component of HDFS architecture framework.
      2. NameNode is also known as Master node.
      3. HDFS Namenode stores meta-data i.e. number of data blocks, file name, path, Block IDs, Block location, no. of replicas, and also Slave related configuration. This meta-data is available in memory in the master for faster retrieval of data.
      4. NameNode keeps metadata related to the file system namespace in memory, for quicker response time. Hence, more memory is needed. So NameNode configuration should be deployed on reliable configuration.
      5. NameNode maintains and manages the slave nodes, and assigns tasks to them.
      6. NameNode has knowledge of all the DataNodes containing data blocks for a given file.
      7. NameNode coordinates with hundreds or thousands of data nodes and serves the requests coming from client applications.
      Two files ‘FSImage’ and the ‘EditLog’ are used to store metadata information.

      FsImage: It is the snapshot the file system when Name Node is started. It is an “Image file”. FsImage contains the entire filesystem namespace and stored as a file in the NameNode’s local file system. It also contains a serialized form of all the directories and file inodes in the filesystem. Each inode is an internal representation of file or directory’s metadata.

      EditLogs: It contains all the recent modifications made to the file system on the most recent FsImage. NameNode receives a create/update/delete request from the client. After that this request is first recorded to edits file.

      Functions of NameNode in HDFS

      1. It is the master daemon that maintains and manages the DataNodes (slave nodes).
      2. It records the metadata of all the files stored in the cluster, e.g. The location of blocks stored, the size of the files, permissions, hierarchy, etc.
      3. It records each change that takes place to the file system metadata. For example, if a file is deleted in HDFS, the NameNode will immediately record this in the EditLog.
      4. It regularly receives a Heartbeat and a block report from all the DataNodes in the cluster to ensure that the DataNodes are live.
      5. It keeps a record of all the blocks in HDFS and in which nodes these blocks are located.
      6. The NameNode is also responsible to take care of the replication factor of all the blocks.
      7. In case of the DataNode failure, the NameNode chooses new DataNodes for new replicas, balance disk usage and manages the communication traffic to the DataNodes.

      HDFS DataNode
      1. DataNode is also known as Slave node.
      2. In Hadoop HDFS Architecture, DataNode stores actual data in HDFS.
      3. DataNodes responsible for serving, read and write requests for the clients.
      4. DataNodes can deploy on commodity hardware.
      5. DataNodes sends information to the NameNode about the files and blocks stored in that node and responds to the NameNode for all filesystem operations.
      6. When a DataNode starts up it announce itself to the NameNode along with the list of blocks it is responsible for.
      7. DataNode is usually configured with a lot of hard disk space. Because the actual data is stored in the DataNode.

      Functions of DataNode in HDFS
      1. These are slave daemons or process which runs on each slave machine.
      2. The actual data is stored on DataNodes.
      3. The DataNodes perform the low-level read and write requests from the file system’s clients.
      4. Every DataNode sends a heartbeat message to the Name Node every 3 seconds and conveys that it is alive. In the scenario when Name Node does not receive a heartbeat from a Data Node for 10 minutes, the Name Node considers that particular Data Node as dead and starts the process of Block replication on some other Data Node..
      5. All Data Nodes are synchronized in the Hadoop cluster in a way that they can communicate with one another and make sure of
      i. Balancing the data in the system
      ii. Move data for keeping high replication
      iii. Copy Data when required

Viewing 3 reply threads
  • You must be logged in to reply to this topic.