Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › What is Balancer in Hadoop?
- This topic has 2 replies, 1 voice, and was last updated 5 years, 6 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 3:11 pm #5441DataFlair TeamSpectator
What is the need of Balancer in Hadoop?
-
September 20, 2018 at 3:12 pm #5443DataFlair TeamSpectator
The HDFS Balancer re-balances data across the DataNodes, moving blocks from over-utilized to under-utilized nodes. HDFS data might not always be distributed uniformly across DataNodes.
The reason for non-uniform distribution is the addition of new DataNodes to an existing cluster.HDFS provides a balancer utility. This utility analyzes block placement and balances data across the DataNodes. It keeps on moving blocks until the cluster is deemed to be balanced, which means that the utilization of every DataNode is uniform.
The balancer does not balance between individual volumes on a single DataNode. Disk balancer first creates a plan. The plan is nothing but a set of statements and then executing a plan on the datanode. The plan tells in detail how much data should move between two disks. A plan has source disk, destination disk and the number of bytes to move. It will execute against an operational datanode.
By default, disk balancer is not enabled; If you want to enable it, to enable diskbalnecer Please follow following steps
1. Open hdfs-site.xml
2. Set the property dfs.disk.balancer.enabled to true
3. Save the fileFollow the link to learn more about: HDFS Disk Balancer in Hadoop
-
September 20, 2018 at 3:12 pm #5445DataFlair TeamSpectator
A Balancer HDFS is designed to run in the background and redistribute the overutilized data node to underutilized data nodes while adhering to Replica Placement policy
The first replica is on the same node as a client, if the client is outside the cluster the node is chosen in random.
The second is placed on the different rack from first and the third is placed on the same rack as of second but different node.The balancer runs until the cluster is balanced. At one time only one balancer may be running on the cluster.
It limits the bandwidth to copy from one Data node to another.default is 1MB/s, it can be changed in hdfs-site.xml.
-
-
AuthorPosts
- You must be logged in to reply to this topic.