Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › Why replication is done in hdfs Hadoop
- This topic has 7 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 4:57 pm #6010DataFlair TeamSpectator
What is the need of Replication in HDFS – Hadoop Distributed File System.
-
September 20, 2018 at 4:57 pm #6011DataFlair TeamSpectator
Replication in HDFS increases the availability of Data at any point of time. If any node containing a block of data which is used for processing crashes, we can get the same block of data from another node this is because of replication.
Replication is one of the major factors in making the HDFS a Fault tolerant system. -
September 20, 2018 at 4:57 pm #6014DataFlair TeamSpectator
The most important feature of HDFS is availability of data and to achieve this objective the concept of Replication of Data Blocks comes into picture.
Replication ensures that the same Data Block/Information is present on more than one Data Node, the address of which is stored in the Name Node so that if one Data Node goes down still the Data Block/Information can be retrieved from another Data Node.
So, Replication helps to achieve availability of Data at all times even in the case of node failure, thus making the system Fault tolerant -
September 20, 2018 at 4:57 pm #6017DataFlair TeamSpectator
Data Availability is the most important feature of HDFS and it is possible because of Data Replication.
Suppose we have a Data Blocks stored only on one DataNode and if this node goes down then there are chances that we might loose the data. So, to cater this problem we do replication.
In Replication, we store the data Block on more than 1 node, so that if 1 node goes down then the data is available on the other node. -
September 20, 2018 at 4:57 pm #6018DataFlair TeamSpectator
How does a NameNode handle the failure of the DataNodes in Hadoop?
-
September 20, 2018 at 4:57 pm #6019DataFlair TeamSpectator
How NameNode tackle Datanode failures in HDFS?
How NameNode handles Datanode failures in Hadoop HDFS? -
September 20, 2018 at 4:58 pm #6020DataFlair TeamSpectator
NameNode has meta data (data about datanode ie; location of datanode , what is the replication factor of the datanode … As per the question am mentioning only these features )
DataNode will constantly send a heartbeat to Name node in this way Name node understands that Data node is working ,if in case (due to any reason) Data node stops sending the heartbeat to the Name node ,then name node will come to know that that particular Data node is down and then make sure that the Blocks in that Date node get replicated in another node and if in case the node which stopped sending the heartbeat again started to send its heartbeat then Name node will balance the replication factor again . In this way, Name node handles the data node failure in Hadoop HDFS
-
September 20, 2018 at 4:58 pm #6022DataFlair TeamSpectator
Datanode constantly communicates with the Namenode, each Datanode sends a Heartbeat message to the Namenode periodically.
If the signal is not received by the Namenade as intended, the Namenode will consider that Datanode as a failure and doesn’t send any new request to the dead datanode. If the Replication Factor is more than 1, the lost Blocks from the dead datanode can be recovered from other datanodes where the replica is available thus providing features like data availability and Fault tolerance.
The Namenode coordinates the replication of data blocks from one Datanode to another. But, the replication data transfer happens directly between Datanode and the data never passes through the Namenode
-
-
AuthorPosts
- You must be logged in to reply to this topic.