Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › Should we use RAID with Hadoop
- This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 5:10 pm #6125DataFlair TeamSpectator
As we know Hadoop handles replication at application level, should we use RAID with Hadoop.
What are the deciding factors for the same ?
-
September 20, 2018 at 5:10 pm #6127DataFlair TeamSpectator
HDFS clusters do not benefit using RAID for data storage, as the redundancy that RAID provides is not required since HDFS handles it by replicating data on different data nodes.
RAID striping used to increase the performance turns out to be slower than the JBOD (Just a bunch of disks) used by HDFS which round-robins across all disks. Its because in RAID, the read/write operations are limited by the slowest disk in the array. In JBOD, the disk operations are independent, so the average speed of operations is greater than the slowest disk.
If a disk fails in JBOD, HDFS can continue to operate with out it, but in RAID if a disk fails the whole array becomes unavailable.RAID is recommended for NameNode to protect corruptions against metadata.
-
September 20, 2018 at 5:10 pm #6129DataFlair TeamSpectator
HDFS itself will take care of fault-tolerance and avoid data loss due to data redundancy/backup available in multiple data nodes. There is no need to use RAID concept an HDFS. Using RAID will make the Hadoop implementation be more expensive which will offer less storage, and also be slower depending on the RAID config.
Since the NameNode is a single-point-of-failure in HDFS we could make use of RAID in name nodes as it requires a more reliable hardware setup.
-
-
AuthorPosts
- You must be logged in to reply to this topic.