What is Balancer in Apache Hadoop?

Viewing 2 reply threads
  • Author
    Posts
    • #5328
      DataFlair TeamDataFlair Team
      Spectator

      What is the need of Balancer in Hadoop?

    • #5331
      DataFlair TeamDataFlair Team
      Spectator

      1. Disk balancer distributes data in a even way on all disks of a datanode.
      2. Disk balancer is different from balancer.which examines data block placement and balances data across the datanodes.
      HDFS might not always place data in a uniform way across the disks due to following reasons:

       A lot of writes and deletes
       Disk replacement
      Operation of Disk balancer
      1. HDFS Disk Balancer in Hadoop work against given data node and moves blocks from one disk to another
      2. Hadoop HDFS Disk balancer works by creating a plan (set of statements) and performing that plan on the data node.
      3. plan describes how much data should move among two disks.
      4. plan has many move steps.
      5. Move step have source disk, destination disk and a number of bytes to move.
      6. A plan can execute against an operational datanode.
      7To enable Disk balancer dfs.disk.balancer.enabled must be set to true in hdfs-site.xml. By default, it is showing disabled.
      policies
      • Round-robin: It distributes the new blocks in a uniform way across the available disks.
      • Available space: It writes data to the disk that has most free space (by percentage).

      For more detail follow: Disk balancer in Hadoop

    • #5333
      DataFlair TeamDataFlair Team
      Spectator

      Whenever a new data node is added to the existing hdfs cluster or a new data node is removed from the existing hdfs cluster then some of the data node in cluster have more or less blocks compared to other data nodes in the existing cluster.

      In this case some data nodes will become over utilized and some will be under utilized.

      So In such scenario to resolve under utilized and over utilized issue we need the balancer to make all the data nodes space uniformly utilized.

      Hadoop administrator will trigger a job to rebalance all the data nodes space.

      Rebalancer is a tool to insure that all the data nodes are uniformly utilized.

      It’s not triggered automate, it’s trigger on demand by hadoop administrator.

      [$hdfs balancer] is command to run the balancer.

      For more detail follow: Disk balancer in Hadoop

Viewing 2 reply threads
  • You must be logged in to reply to this topic.