Explain NameNode and DataNode in HDFS?

Viewing 3 reply threads
  • Author
    Posts
    • #5224
      DataFlair TeamDataFlair Team
      Spectator

      What is the difference between NameNode and DataNode in HDFS? Name node vs Data Node?
      What is the function of NameNode in HDFS?
      What is the role of DataNode in HDFS?

    • #5227
      DataFlair TeamDataFlair Team
      Spectator

      NameNode works as Master in Hadoop cluster. Below listed are the main function performed by NameNode:

      1. Stores metadata of actual data. E.g. Filename, Path, No. of Data Blocks, Block IDs, Block Location, No. of Replicas, Slave related configuration
      2. Manages File system namespace.
      3. Regulates client access request for actual file data file.
      4. Assign work to Slaves(DataNode).
      5. Executes file system name space operation like opening/closing files, renaming files and directories.
      6. As Name node keep metadata in memory for fast retrieval, the huge amount of memory is required for its operation. This should be hosted on reliable hardware.

      DataNode works as Slave in Hadoop cluster . Below listed are the main function performed by DataNode:

      1. Actually stores Business data.
      2. This is actual worker node were Read/Write/Data processing is handled.
      3. Upon instruction from Master, it performs creation/replication/deletion of data blocks.
      4. As all the Business data is stored on DataNode, the huge amount of storage is required for its operation. Commodity hardware can be used for hosting DataNode.

      To learn more about NameNode and DataNode follow: HDFS Architecture

    • #5228
      DataFlair TeamDataFlair Team
      Spectator

      Namenode

      1. NameNode is the centerpiece of HDFS.
      2. NameNode is also known as the Master
      NameNode only stores the metadata of HDFS – the directory tree of all files in the file system, and tracks the files across the cluster.
      3. NameNode does not store the actual data or the dataset. The data itself is actually stored in the DataNodes.
      4. NameNode knows the list of the Blocks and its location for any given file in HDFS. With this information NameNode knows how to construct the file from blocks.
      5. NameNode is so critical to HDFS and when the NameNode is down, HDFS/Hadoop cluster is inaccessible and considered down.
      6. NameNode is a single point of failure in Hadoop cluster.
      7. NameNode is usually configured with a lot of memory (RAM). Because the block locations are held in main memory

      DataNode

      1. DataNode is responsible for storing the actual data in HDFS.
      2. DataNode is also known as the Slave
      3. NameNode and DataNode are in constant communication.
      4. When a DataNode starts up it announce itself to the NameNode along with the list of blocks it is responsible for.
      5. When a DataNode is down, it does not affect the availability of data or the cluster. NameNode will arrange for replication for the blocks managed by the DataNode that is not available.
      6. DataNode is usually configured with a lot of hard disk space. Because the actual data is stored in the DataNode.
      DataNode periodically send HEARTBEATS to NameNode

      To learn more about NameNode and DataNode follow: HDFS Architecture

    • #5230
      DataFlair TeamDataFlair Team
      Spectator

      An HDFS cluster has two types of nodes operating in a master−slave pattern:

      1. NameNode (the master) and
      2. Number of DataNodes (slaves/workers).

      HDFS NameNode
      1. NameNode is the main central component of HDFS architecture framework.
      2. NameNode is also known as Master node.
      3. HDFS Namenode stores meta-data i.e. number of data blocks, file name, path, Block IDs, Block location, no. of replicas, and also Slave related configuration. This meta-data is available in memory in the master for faster retrieval of data.
      4. NameNode keeps metadata related to the file system namespace in memory, for quicker response time. Hence, more memory is needed. So NameNode configuration should be deployed on reliable configuration.
      5. NameNode maintains and manages the slave nodes, and assigns tasks to them.
      6. NameNode has knowledge of all the DataNodes containing data blocks for a given file.
      7. NameNode coordinates with hundreds or thousands of data nodes and serves the requests coming from client applications.
      Two files ‘FSImage’ and the ‘EditLog’ are used to store metadata information.

      FsImage: It is the snapshot the file system when Name Node is started. It is an “Image file”. FsImage contains the entire filesystem namespace and stored as a file in the NameNode’s local file system. It also contains a serialized form of all the directories and file inodes in the filesystem. Each inode is an internal representation of file or directory’s metadata.

      EditLogs: It contains all the recent modifications made to the file system on the most recent FsImage. NameNode receives a create/update/delete request from the client. After that this request is first recorded to edits file.

      Functions of NameNode in HDFS

      1. It is the master daemon that maintains and manages the DataNodes (slave nodes).
      2. It records the metadata of all the files stored in the cluster, e.g. The location of blocks stored, the size of the files, permissions, hierarchy, etc.
      3. It records each change that takes place to the file system metadata. For example, if a file is deleted in HDFS, the NameNode will immediately record this in the EditLog.
      4. It regularly receives a Heartbeat and a block report from all the DataNodes in the cluster to ensure that the DataNodes are live.
      5. It keeps a record of all the blocks in HDFS and in which nodes these blocks are located.
      6. The NameNode is also responsible to take care of the replication factor of all the blocks.
      7. In case of the DataNode failure, the NameNode chooses new DataNodes for new replicas, balance disk usage and manages the communication traffic to the DataNodes.

      HDFS DataNode
      1. DataNode is also known as Slave node.
      2. In Hadoop HDFS Architecture, DataNode stores actual data in HDFS.
      3. DataNodes responsible for serving, read and write requests for the clients.
      4. DataNodes can deploy on commodity hardware.
      5. DataNodes sends information to the NameNode about the files and blocks stored in that node and responds to the NameNode for all filesystem operations.
      6. When a DataNode starts up it announce itself to the NameNode along with the list of blocks it is responsible for.
      7. DataNode is usually configured with a lot of hard disk space. Because the actual data is stored in the DataNode.

      Functions of DataNode in HDFS
      1. These are slave daemons or process which runs on each slave machine.
      2. The actual data is stored on DataNodes.
      3. The DataNodes perform the low-level read and write requests from the file system’s clients.
      4. Every DataNode sends a heartbeat message to the Name Node every 3 seconds and conveys that it is alive. In the scenario when Name Node does not receive a heartbeat from a Data Node for 10 minutes, the Name Node considers that particular Data Node as dead and starts the process of Block replication on some other Data Node..
      5. All Data Nodes are synchronized in the Hadoop cluster in a way that they can communicate with one another and make sure of
      i. Balancing the data in the system
      ii. Move data for keeping high replication
      iii. Copy Data when required

Viewing 3 reply threads
  • You must be logged in to reply to this topic.