Which part of hdfs is recommended to use RAID?

Job-ready Courses with Certificates – Learn Today. Lead Tomorrow. Forums Apache Hadoop Which part of hdfs is recommended to use RAID?

Viewing 1 reply thread
  • Author
    Posts
    • #5922
      DataFlair Team
      Spectator

      In HDFS where we should use RAID?

    • #5923
      DataFlair Team
      Spectator

      HDFS clusters do not benefit from using RAID for Datanode storage , although RAID is recommended for the namenode’s disks to protect against corruption of its metadata.

      Raid is used for 2 purposes:
      1) Fault Tolerance
      2) Better Performance

      HDFS has similar mechanisms which ensure:
      Fault-tolerance: If a disk or node goes down, other replicas are available on different data nodes and disks.
      High sequential read/write performance: By splitting a file into multiple chunks and storing them on different nodes (and different disks), a file can be read in parallel by concurrently accessing multiple disks (on different nodes). Each disk can read data with its full bandwidth and its read operations do not interfere with other disks. If the cluster is well utilized all disks will be spinning at full speed delivering the maximum sequential read performance.

      Since HDFS is taking care of fault-tolerance and “striped” reading, there is no need to use RAID underneath an HDFS. Using RAID will only be more expensive, offer less storage, and also be slower (depending on the concrete RAID config).

      Since the namenode is a single-point-of-failure in HDFS, it requires a more reliable hardware setup. Therefore, the use of RAID is recommended on namenodes.

Viewing 1 reply thread
  • You must be logged in to reply to this topic.