What is Balancer in Apache Hadoop?

This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 2:52 pm #5328
  
  DataFlair Team
  Spectator
  
  What is the need of Balancer in Hadoop?
- September 20, 2018 at 2:52 pm #5331
  
  DataFlair Team
  Spectator
  
  1. Disk balancer distributes data in a even way on all disks of a datanode.
  2. Disk balancer is different from balancer.which examines data block placement and balances data across the datanodes.
  HDFS might not always place data in a uniform way across the disks due to following reasons:
  
   A lot of writes and deletes
   Disk replacement
  Operation of Disk balancer
  1. HDFS Disk Balancer in Hadoop work against given data node and moves blocks from one disk to another
  2. Hadoop HDFS Disk balancer works by creating a plan (set of statements) and performing that plan on the data node.
  3. plan describes how much data should move among two disks.
  4. plan has many move steps.
  5. Move step have source disk, destination disk and a number of bytes to move.
  6. A plan can execute against an operational datanode.
  7To enable Disk balancer dfs.disk.balancer.enabled must be set to true in hdfs-site.xml. By default, it is showing disabled.
  policies
  • Round-robin: It distributes the new blocks in a uniform way across the available disks.
  • Available space: It writes data to the disk that has most free space (by percentage).
  
  For more detail follow: Disk balancer in Hadoop
- September 20, 2018 at 2:52 pm #5333
  
  DataFlair Team
  Spectator
  
  Whenever a new data node is added to the existing hdfs cluster or a new data node is removed from the existing hdfs cluster then some of the data node in cluster have more or less blocks compared to other data nodes in the existing cluster.
  
  In this case some data nodes will become over utilized and some will be under utilized.
  
  So In such scenario to resolve under utilized and over utilized issue we need the balancer to make all the data nodes space uniformly utilized.
  
  Hadoop administrator will trigger a job to rebalance all the data nodes space.
  
  Rebalancer is a tool to insure that all the data nodes are uniformly utilized.
  
  It’s not triggered automate, it’s trigger on demand by hadoop administrator.
  
  [$hdfs balancer] is command to run the balancer.
  
  For more detail follow: Disk balancer in Hadoop
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

What is Balancer in Apache Hadoop?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses