What is Rack Awareness in Apache Hadoop?

Viewing 2 reply threads
  • Author
    Posts
    • #5513
      DataFlair TeamDataFlair Team
      Spectator

      What is Rack Awareness in Hadoop HDFS?
      What is the need of Rack Awareness in Hadoop HDFS?
      Why rack awareness is needed in Hadoop?

    • #5515
      DataFlair TeamDataFlair Team
      Spectator

      In the large cluster of Hadoop, in order to improve the network traffic while reading/writing HDFS file, NameNode chooses the DataNode which is closer to the same rack or nearby rack to Read /write request. NameNode achieves rack information by maintaining the rack ids of each DataNode. This concept that chooses Datanodes based on the rack information is called Rack Awareness in Hadoop.
      In HDFS, NameNode makes sure that all the replicas are not stored on the same rack or single rack; it follows Rack Awareness Algorithm to reduce latency as well as fault tolerance.

      Let’s assume if the Replication Factor is 3 and Client want to place a file in HDFS, then Hadoop places the replicas as follows:

      The first Replica is placed on a Node (N1) closest to the Client Node; probably on the same node as the Client Node if feasible.
      The second Replica is placed on a random Node (N2) which resides in a different Rack than the first Node.
      The third Replica is placed on another Node (N3) which resides in the same Rack as of (N2).
      This configuration is maintained to make sure that the File is never lost in case of a Node Failure or even an entire Rack Failure.

      Follow the link to learn more about Rack Awareness in Hadoop

    • #5516
      DataFlair TeamDataFlair Team
      Spectator

      Name node and data nodes are connected through the network switch and HDFS storage and processing are carried with the help of network switch. Whenever there is an error at network switch entire cluster goes down.

      Like, Replication factor in HDFS allows a block to be replicated to 3 times (as a standard) and save them in different nodes.With this HDFS is able to achieve high availability and Fault tolerance. In a similar manner, we have multi-level network configuration where it can overcome Network failure. With the help of Rack topology/ rack awareness, all the Name nodes and data nodes are connected to different switches. Each switch with Name node and data node are arranged as a rack.

      Rack Awareness will help Name node to decide the nearest data node and places replicas there. Not all the three replicas will be stored in the same rack. At least one copy of data is stored in the different rack. This will ensure the data availability of data even the entire rack is failed. As a standard, we have 45 machines in a rack. Rack topology can be applied at data center level. Even if one data center fails, we can continue our work from different data center

      Advantages are detailed below:
      1) Ensures multiple copies of data in different rack
      2) Improves performance of cluster
      3) To keep bulk data in cluster

      For more detail follow Rack Awareness in Hadoop

Viewing 2 reply threads
  • You must be logged in to reply to this topic.