Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › Why HDFS performs replication, although it leads to consumption of lot of space?
- This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 1:56 pm #5024DataFlair TeamSpectator
Why HDFS performs replication, although it results in data redundancy?
-
September 20, 2018 at 1:57 pm #5025DataFlair TeamSpectator
HDFS provides reliable, scalable and fault tolerant data processing system.
Whenever the client comes with data to be written to data nodes. It is written to 3 (default replication factor is 3, this can be configured) different
Blocks which are present in different data nodes. These data nodes are present in different racks as well.So even if one of data nodes containing a particular block goes down a copy of that block will always be available on some other data node.In this way data is never lost in Hadoop.
Also whenever a data node goes faulty, the block is replicated to another data node so that the replication factor is always maintained.The data nodes on which these blocks are stored are commodity hardware which are not very expensive , so Hadoop provides a cost effective and fault tolerant data processing system.
-
September 20, 2018 at 1:57 pm #5026DataFlair TeamSpectator
Hadoop is designed and developed to analyze (or to perform set of actions on) small number of very large files (Terabytes or Petabytes).
So to store such big files, commodity hardware are preferred due to their cost effectiveness. Now a days, data availability is more important than actual disk space.
In case of failure of data nodes or racks, data is highly available on other data nodes or racks. HDFS is pretty intelligent to make data available at any time.
Hence to achieve data availability at all the time, HDFS performs replication of data.
-
-
AuthorPosts
- You must be logged in to reply to this topic.