Why HDFS performs replication, although it leads to consumption of lot of space?

This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 1:56 pm #5024
  
  DataFlair Team
  Spectator
  
  Why HDFS performs replication, although it results in data redundancy?
- September 20, 2018 at 1:57 pm #5025
  
  DataFlair Team
  Spectator
  
  HDFS provides reliable, scalable and fault tolerant data processing system.
  
  Whenever the client comes with data to be written to data nodes. It is written to 3 (default replication factor is 3, this can be configured) different
  Blocks which are present in different data nodes. These data nodes are present in different racks as well.
  
  So even if one of data nodes containing a particular block goes down a copy of that block will always be available on some other data node.In this way data is never lost in Hadoop.
  Also whenever a data node goes faulty, the block is replicated to another data node so that the replication factor is always maintained.
  
  The data nodes on which these blocks are stored are commodity hardware which are not very expensive , so Hadoop provides a cost effective and fault tolerant data processing system.
- September 20, 2018 at 1:57 pm #5026
  
  DataFlair Team
  Spectator
  
  Hadoop is designed and developed to analyze (or to perform set of actions on) small number of very large files (Terabytes or Petabytes).
  
  So to store such big files, commodity hardware are preferred due to their cost effectiveness. Now a days, data availability is more important than actual disk space.
  
  In case of failure of data nodes or racks, data is highly available on other data nodes or racks. HDFS is pretty intelligent to make data available at any time.
  
  Hence to achieve data availability at all the time, HDFS performs replication of data.
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

Why HDFS performs replication, although it leads to consumption of lot of space?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses