Hadoop High Availability – HDFS Feature

Keeping you updated with latest technology trends, Join DataFlair on Telegram

1. Overview

In this Hadoop tutorial, we will discuss the Hadoop High Availability feature. The tutorial covers an introduction to Hadoop High Availability, how high availability is achieved in Hadoop, what were the issues in legacy systems, and examples of High Availability in Hadoop.

Hadoop HDFS High Availability

Hadoop HDFS High Availability

Learn How to install and configure Hadoop on a single machine and multi-node cluster.

2. Hadoop HDFS High Availability – Introduction

Hadoop High Availability

Hadoop High Availability

HDFS is a distributed file system. It distributes data among the nodes in the cluster by creating a replica of the file. These replicas of files are stored on the other machines present in the HDFS cluster. Hence whenever a user wants to access his data, he can access that data from a number of machines present in the cluster which is easily available in the closest node in the cluster. Also during some unfavorable conditions like a failure of a node, a user can easily access their data from the other nodes. Because HDFS creates a replica of user data on the other nodes present in the HDFS cluster. To learn more about world’s most reliable storage layer follow this HDFS introductory guide.

If these professionals can make a switch to Big Data, so can you:
Rahul Doddamani Story - DataFlair
Rahul Doddamani
Java → Big Data Consultant, JDA
Follow on
Mritunjay Singh Success Story - DataFlair
Mritunjay Singh
PeopleSoft → Big Data Architect, Hexaware
Follow on
Rahul Doddamani Success Story - DataFlair
Rahul Doddamani
Big Data Consultant, JDA
Follow on
I got placed, scored 100% hike, and transformed my career with DataFlair
Enroll now
Richa Tandon Success Story - DataFlair
Richa Tandon
Support → Big Data Engineer, IBM
Follow on
DataFlair Web Services
You could be next!
Enroll now

3. How is High Availability achieved in Hadoop HDFS?

As there are a number of DataNodes in the HDFS cluster and after a definite interval of time all these DataNodes sends heartbeat messages to the NameNode and if the NameNode stops receiving heartbeat messages from any of these DataNodes, then it assumes it to be dead. And it then checks for the data present in those nodes, then it gives commands to the other datanodes (having same data, which was available on the failed node) to create a replica of that data to other datanodes. Hence data is always available.

So whenever a user asks for a data access in HDFS, then NameNode first of all searches for the data in that datanodes, in which data is quickly available and provides access to that data to the user. Users do not have to search for the data in all the datanodes. Namenode itself makes data availability easy to the users by providing the address of the datanode from where a user can directly read. Learn more about Internals of HDFS Data Read Operation.

Hadoop Quiz

4. Example of Hadoop HDFS High Availability

HDFS provides High availability of data. Whenever user requests for data access to the NameNode, then the NameNode searches for all the nodes in which that data is available. And then provides access to that data to the user from the node in which data was quickly available. While searching for data on all the nodes in the cluster, if NameNode finds some node to be dead, then without user knowledge NameNode redirects the user to the other node in which the same data is available. Without any interruption, data is made available to the user. So in conditions of node failure also data is highly available to the users. Also, any individual node failure does not affect applications. Learn HDFS Read write operations.

5. What were the Issues in legacy systems?

  • Data unavailable due to the crashing of a machine.
  • Users have to wait for a long period of time to access their data, sometimes users have to wait for a particular period of time till the website becomes up.
  • Due to unavailability of data, completion of many major projects at organizations gets extended for a long period of time. Hence companies have to go through critical situations.
  • Limited features and functionalities.

See Also-

Reference for Hadoop HDFS

2 Responses

  1. Prince Bhardwaj says:

    My self prince bhardwaj and I want to do course of Machine Learning.
    Do you offer this course?

    • Data Flair says:

      Hii Prince,
      Thank you for visiting the website.
      You asked for a course on Machine Learning. Currently, we don’t have the course in our pipeline. But we have a complete Machine Learning tutorial curated for you. You can learn freely through our website. I am providing you the best link, with which you can start your Machine Learning journey.
      While going through this, if you face any hurdles, you can ask us through comments, we will be glad to help eager learners like you.
      Machine Learning Tutorials

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.