What is balancer? How to run a cluster balancing utility?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 2:58 pm #5362
  
  DataFlair Team
  Spectator
  
  In Hadoop, HDFS data might not always be placed evenly across the DataNode due to the addition of new DataNodes to an existing cluster. While placing new Blocks, NameNode considers various parameters before choosing the DataNodes to receive these blocks.
  HDFS provides a tool called Balancer, that analyzes block placement and rebalances data across the DataNode, and it is generally managed by the Hadoop Administrator
  
  To run a cluster balancing utility we run the following command
  $ hadoop balancer [-threshold ]
  
  where -threshold is the percentage of disk capacity. This overwrites the default threshold.
- September 20, 2018 at 2:58 pm #5365
  
  DataFlair Team
  Spectator
  
  In Hadoop, HDFS new blocks are allocated evenly among all the datanodes. But in large scale cluster, each node has different capacity, you will often need to add new nodes or remove old nodes for better performance. Then How Hadoop will balance the data usage on all data nodes?
  
  The answer is that Hadoop has its balanced policy to make sure all nodes data are balanced , So, there is HDFS Balancer to rebalance among the cluster datanodes, for unbalanced situation like new nodes adding, deletion caused unbalancing etc.
  
  HDFS balancer doesn’t run at background, has to run manually. To run HDFS balancer Command :
  hdfs balancer [-threshold <threshold>]Percentage of disk capacity
  
  The threshold parameter is number between 0 and 100 .
  From the average cluster utilization, the balancer process will try to converge all datanodes’ usage in the range [average – threshold, average + threshold].
  
  Default threshold is 10%
  
  For example, if the cluster current utilization is 50% full, then higher usage datanodes will start move data to lower usage nodes.
  
  – Higher (average + threshold): 60%
  – Lower (average – threshold): 40%
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

What is balancer? How to run a cluster balancing utility?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses