Why replication is done in hdfs Hadoop

Viewing 7 reply threads
  • Author
    Posts
    • #6010
      DataFlair TeamDataFlair Team
      Spectator

      What is the need of Replication in HDFS – Hadoop Distributed File System.

    • #6011
      DataFlair TeamDataFlair Team
      Spectator

      Replication in HDFS increases the availability of Data at any point of time. If any node containing a block of data which is used for processing crashes, we can get the same block of data from another node this is because of replication.
      Replication is one of the major factors in making the HDFS a Fault tolerant system.

    • #6014
      DataFlair TeamDataFlair Team
      Spectator

      The most important feature of HDFS is availability of data and to achieve this objective the concept of Replication of Data Blocks comes into picture.
      Replication ensures that the same Data Block/Information is present on more than one Data Node, the address of which is stored in the Name Node so that if one Data Node goes down still the Data Block/Information can be retrieved from another Data Node.
      So, Replication helps to achieve availability of Data at all times even in the case of node failure, thus making the system Fault tolerant

    • #6017
      DataFlair TeamDataFlair Team
      Spectator

      Data Availability is the most important feature of HDFS and it is possible because of Data Replication.
      Suppose we have a Data Blocks stored only on one DataNode and if this node goes down then there are chances that we might loose the data. So, to cater this problem we do replication.
      In Replication, we store the data Block on more than 1 node, so that if 1 node goes down then the data is available on the other node.

    • #6018
      DataFlair TeamDataFlair Team
      Spectator

      How does a NameNode handle the failure of the DataNodes in Hadoop?

    • #6019
      DataFlair TeamDataFlair Team
      Spectator

      How NameNode tackle Datanode failures in HDFS?
      How NameNode handles Datanode failures in Hadoop HDFS?

    • #6020
      DataFlair TeamDataFlair Team
      Spectator

      NameNode has meta data (data about datanode ie; location of datanode , what is the replication factor of the datanode … As per the question am mentioning only these features )

      DataNode will constantly send a heartbeat to Name node in this way Name node understands that Data node is working ,if in case (due to any reason) Data node stops sending the heartbeat to the Name node ,then name node will come to know that that particular Data node is down and then make sure that the Blocks in that Date node get replicated in another node and if in case the node which stopped sending the heartbeat again started to send its heartbeat then Name node will balance the replication factor again . In this way, Name node handles the data node failure in Hadoop HDFS

    • #6022
      DataFlair TeamDataFlair Team
      Spectator

      Datanode constantly communicates with the Namenode, each Datanode sends a Heartbeat message to the Namenode periodically.

      If the signal is not received by the Namenade as intended, the Namenode will consider that Datanode as a failure and doesn’t send any new request to the dead datanode. If the Replication Factor is more than 1, the lost Blocks from the dead datanode can be recovered from other datanodes where the replica is available thus providing features like data availability and Fault tolerance.

      The Namenode coordinates the replication of data blocks from one Datanode to another. But, the replication data transfer happens directly between Datanode and the data never passes through the Namenode

Viewing 7 reply threads
  • You must be logged in to reply to this topic.