In order to make sure all nodes are being balanced used, Hadoop has its balanced policy, apart from that for the unbalanced situation like new nodes adding, deletion caused unbalancing etc, there is HDFS balancer to rebalance the space usage among the cluster data nodes.
Hadoop space balance policy
There is 3 space balance related parameter in Hadoop:
– Balanced space preference fraction
– Balanced space threshold
– Balance bandwidth control
HDFS Balancer
As we know, data might not be uniformly placed across the DataNodes, due to multiple competing considerations, So, HDFS offers a tool for administrators which analyzes block placement and also rebalances data across the DataNode.
Note: HDFS balancer has to run manually, it doesn’t run at background.
learn more about HDFS Disk balancer, follow the link: HDFS Disk Balancer – Learn how to Balance Data on DataNode