Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › What is Balancer in Apache Hadoop?
- This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 2:52 pm #5328DataFlair TeamSpectator
What is the need of Balancer in Hadoop?
-
September 20, 2018 at 2:52 pm #5331DataFlair TeamSpectator
1. Disk balancer distributes data in a even way on all disks of a datanode.
2. Disk balancer is different from balancer.which examines data block placement and balances data across the datanodes.
HDFS might not always place data in a uniform way across the disks due to following reasons: A lot of writes and deletes
Disk replacement
Operation of Disk balancer
1. HDFS Disk Balancer in Hadoop work against given data node and moves blocks from one disk to another
2. Hadoop HDFS Disk balancer works by creating a plan (set of statements) and performing that plan on the data node.
3. plan describes how much data should move among two disks.
4. plan has many move steps.
5. Move step have source disk, destination disk and a number of bytes to move.
6. A plan can execute against an operational datanode.
7To enable Disk balancer dfs.disk.balancer.enabled must be set to true in hdfs-site.xml. By default, it is showing disabled.
policies
• Round-robin: It distributes the new blocks in a uniform way across the available disks.
• Available space: It writes data to the disk that has most free space (by percentage).For more detail follow: Disk balancer in Hadoop
-
September 20, 2018 at 2:52 pm #5333DataFlair TeamSpectator
Whenever a new data node is added to the existing hdfs cluster or a new data node is removed from the existing hdfs cluster then some of the data node in cluster have more or less blocks compared to other data nodes in the existing cluster.
In this case some data nodes will become over utilized and some will be under utilized.
So In such scenario to resolve under utilized and over utilized issue we need the balancer to make all the data nodes space uniformly utilized.
Hadoop administrator will trigger a job to rebalance all the data nodes space.
Rebalancer is a tool to insure that all the data nodes are uniformly utilized.
It’s not triggered automate, it’s trigger on demand by hadoop administrator.
[$hdfs balancer] is command to run the balancer.
For more detail follow: Disk balancer in Hadoop
-
-
AuthorPosts
- You must be logged in to reply to this topic.