What is Balancer in Hadoop?

Viewing 2 reply threads
  • Author
    Posts
    • #5441
      DataFlair TeamDataFlair Team
      Spectator

      What is the need of Balancer in Hadoop?

    • #5443
      DataFlair TeamDataFlair Team
      Spectator

      The HDFS Balancer re-balances data across the DataNodes, moving blocks from over-utilized to under-utilized nodes. HDFS data might not always be distributed uniformly across DataNodes.
      The reason for non-uniform distribution is the addition of new DataNodes to an existing cluster.

      HDFS provides a balancer utility. This utility analyzes block placement and balances data across the DataNodes. It keeps on moving blocks until the cluster is deemed to be balanced, which means that the utilization of every DataNode is uniform.

      The balancer does not balance between individual volumes on a single DataNode. Disk balancer first creates a plan. The plan is nothing but a set of statements and then executing a plan on the datanode. The plan tells in detail how much data should move between two disks. A plan has source disk, destination disk and the number of bytes to move. It will execute against an operational datanode.

      By default, disk balancer is not enabled; If you want to enable it, to enable diskbalnecer Please follow following steps
      1. Open hdfs-site.xml
      2. Set the property dfs.disk.balancer.enabled to true
      3. Save the file

      Follow the link to learn more about: HDFS Disk Balancer in Hadoop

    • #5445
      DataFlair TeamDataFlair Team
      Spectator

      A Balancer HDFS is designed to run in the background and redistribute the overutilized data node to underutilized data nodes while adhering to Replica Placement policy

      The first replica is on the same node as a client, if the client is outside the cluster the node is chosen in random.

      The second is placed on the different rack from first and the third is placed on the same rack as of second but different node.The balancer runs until the cluster is balanced. At one time only one balancer may be running on the cluster.

      It limits the bandwidth to copy from one Data node to another.default is 1MB/s, it can be changed in hdfs-site.xml.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.