{"id":2726,"date":"2017-06-05T10:49:54","date_gmt":"2017-06-05T10:49:54","guid":{"rendered":"http:\/\/data-flair.training\/blogs\/?p=2726"},"modified":"2021-08-25T22:33:25","modified_gmt":"2021-08-25T17:03:25","slug":"hadoop-hdfs-disk-balancer","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/hadoop-hdfs-disk-balancer\/","title":{"rendered":"HDFS Disk Balancer &#8211; Learn how to Balance Data on DataNode"},"content":{"rendered":"<p>Disk Balancer is a <strong>command-line tool<\/strong> introduced in Hadoop 3 for balancing the disks within the <strong>DataNode.<\/strong>\u00a0HDFS diskbalancer is different from the HDFS Balancer, which balances the distribution across the nodes.<\/p>\n<p>In this article, we will study the following points:<\/p>\n<ul>\n<li><a class=\"_ps2id\" href=\"#Need-for-disk-Balancer\">What is the need for Disk Balancer\u00a0in Hadoop HDFS\u00a0<\/a><\/li>\n<li><a class=\"_ps2id\" href=\"#Introduction-to-Disk-Balancer\">Introduction to HDFS Disk Balancer<\/a><\/li>\n<li><a class=\"_ps2id\" href=\"#How-Disk-Balancer-works\">How Disk Balancer works<\/a><\/li>\n<li><a class=\"_ps2id\" href=\"#Functions-of-Disk-Balancer\">Functions of HDFS Disk Balancer<\/a><\/li>\n<li><a class=\"_ps2id\" href=\"#Commands-Supported-by-Disk-Balancer\">HDFS Disk Balancer commands<\/a><\/li>\n<li><a class=\"_ps2id\" href=\"#Disk-Balancer-Settings\">HDFS Disk Balancer Settings<\/a><\/li>\n<\/ul>\n<h3>[ps2id id=&#8217;Need-for-disk-Balancer&#8217; target=&#8221;\/]Need for HDFS disk Balancer<\/h3>\n<p>In Hadoop HDFS, DataNode distributes <a href=\"https:\/\/data-flair.training\/blogs\/data-block\/\"><strong>data blocks<\/strong><\/a> between the disks on the DataNode. While writing new blocks in HDFS, DataNodes chooses <strong>volume-choosing policies<\/strong> (round-robin policy or available space policy) to choose disk (volume) for a block.<\/p>\n<p><strong>Round-Robin policy:<\/strong> It spread the new blocks evenly across the available disks. DataNode uses this policy by default.<\/p>\n<p><strong>Available space policy:<\/strong> This policy writes data to those disks that have more free space (by percentage).<\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/07\/hdfs-disk-balancing-policy.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3479\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/07\/hdfs-disk-balancing-policy.jpg\" alt=\"hdfs disk balancing policy\" width=\"802\" height=\"420\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/07\/hdfs-disk-balancing-policy.jpg 802w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/07\/hdfs-disk-balancing-policy-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/07\/hdfs-disk-balancing-policy-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/07\/hdfs-disk-balancing-policy-768x402.jpg 768w\" sizes=\"auto, (max-width: 802px) 100vw, 802px\" \/><\/a><\/p>\n<p>However, with round-robin policy in the long-running cluster, DataNodes sometimes unevenly fill their storage directories (disks\/volume), leading to situations where certain disks are full while others are significantly less used. This happens either because of large amounts of writes and deletes or due to disk replacement.<\/p>\n<p>Also, if we use available-space based volume-choosing policy, then every new write will go to the newly-added empty disk making other disks idle during the period. This will create a bottleneck on the new disk.<\/p>\n<p>Thus there arises a need for<strong> Intra DataNode Balancing<\/strong> (even distribution of data blocks within DataNode) to address the Intra-DataNode skews (uneven distribution of blocks across disk), which occur due to disk replacement or random <a href=\"https:\/\/data-flair.training\/blogs\/hdfs-data-write-operation\/\"><strong>writes<\/strong><\/a> and deletes.<\/p>\n<p>Therefore, a tool named Disk Balancer was introduced in Hadoop 3.0 that focused on distributing data within a node.<\/p>\n<h2>[ps2id id=&#8217;Introduction-to-Disk-Balancer&#8217; target=&#8221;\/]Introduction to HDFS Disk Balancer<\/h2>\n<p>Disk Balancer is a command-line tool introduced in Hadoop HDFS for <strong>Intra-DataNode balancing<\/strong>. HDFS diskbalancer spread data evenly across all disks of a DataNode. Unlike a Balancer which rebalances data across the DataNode, DiskBalancer distributes data within the DataNode.<\/p>\n<p>HDFS Disk Balancer operates against a given DataNode and moves blocks from one disk to another.<\/p>\n<h3>[ps2id id=&#8217;How-Disk-Balancer-works&#8217; target=&#8221;\/]How HDFS Disk Balancer Works<\/h3>\n<p>HDFS Disk Balancer operates by creating a plan, which is a set of statements that describes how much data should move between two disks, and goes on to execute that set of statements on the DataNode. A plan consists of multiple move steps. Each move step in a plan has an address of the destination disk, source disk. A move step also has the number of bytes to move. This plan is executed against an operational DataNode.<\/p>\n<p>By default, Disk Balancer is not enabled on a Hadoop cluster. One can enable the Disk Balancer in Hadoop by setting <strong>dfs.disk.balancer.enabled<\/strong> true in <strong>hdfs-site.xml<\/strong>.<\/p>\n<h3>[ps2id id=&#8217;Functions-of-Disk-Balancer&#8217; target=&#8221;\/]Functions of HDFS Disk Balancer<\/h3>\n<p>HDFS Diskbalancer supports two major functions i.e, <strong>reporting<\/strong> and <strong>balancing<\/strong>.<\/p>\n<h4>1. Data Spread Report<\/h4>\n<p>In order to define a way to measure which machines in the cluster suffer from the uneven data distribution, the HDFS disk balancer defines the HDFS <strong>Volume Data Density metric<\/strong> and the <strong>Node Data Density metric<\/strong>.<\/p>\n<p>HDFS Volume data density metric allows us to compare how well the data is spread across different volumes of a given node.<\/p>\n<p>The Node data density metric allows comparing between nodes.<\/p>\n<p><strong>1.1 Volume Data Density or Intra-Node Data Density<\/strong><\/p>\n<p>Volume data density metric computes how much data exits on a node and what should be the ideal storage on each volume.<\/p>\n<p>The ideal storage percentage for each device is equal to the total data stored on that node divided by the total disk capacity on that node for each storage-type.<\/p>\n<p>Suppose we have a machine with four volumes &#8211; Disk1, Disk2, Disk3, Disk4.<\/p>\n<p>The following table shows the metric and its computation.<\/p>\n<p>Table 1: Disk capacity and usage on a machine<\/p>\n<table class=\"df-table-center\">\n<tbody>\n<tr>\n<td><\/td>\n<td><b>Disk1<\/b><\/td>\n<td><b>Disk2<\/b><\/td>\n<td><b>Disk3<\/b><\/td>\n<td><b>Disk4<\/b><\/td>\n<\/tr>\n<tr>\n<td><b>capacity<\/b><\/td>\n<td><span style=\"font-weight: 400\">200 GB<\/span><\/td>\n<td><span style=\"font-weight: 400\">300 GB<\/span><\/td>\n<td><span style=\"font-weight: 400\">350 GB<\/span><\/td>\n<td><span style=\"font-weight: 400\">2500 GB<\/span><\/td>\n<\/tr>\n<tr>\n<td><b>dfsUsed<\/b><\/td>\n<td><span style=\"font-weight: 400\">100 GB<\/span><\/td>\n<td><span style=\"font-weight: 400\">76 GB<\/span><\/td>\n<td><span style=\"font-weight: 400\">300 GB<\/span><\/td>\n<td><span style=\"font-weight: 400\">475 GB<\/span><\/td>\n<\/tr>\n<tr>\n<td><b>dfsUsedRatio<\/b><\/td>\n<td><span style=\"font-weight: 400\">0.5<\/span><\/td>\n<td><span style=\"font-weight: 400\">0.25<\/span><\/td>\n<td><span style=\"font-weight: 400\">0.85<\/span><\/td>\n<td><span style=\"font-weight: 400\">0.95<\/span><\/td>\n<\/tr>\n<tr>\n<td><b>volumeDataDensity<\/b><\/td>\n<td><span style=\"font-weight: 400\">0.20<\/span><\/td>\n<td><span style=\"font-weight: 400\">0.45<\/span><\/td>\n<td><span style=\"font-weight: 400\">-0.15<\/span><\/td>\n<td><span style=\"font-weight: 400\">-0.24<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>In this example,<\/p>\n<p><strong>Total capacity<\/strong>= 200 + 300 + 350 + 500 = 1350GB<\/p>\n<p>and<\/p>\n<p><strong>Total Used<\/strong>= 100 + 76 + 300 + 475 = 951 GB<\/p>\n<p>Therefore, the ideal storage on each volume\/disk is:<\/p>\n<p><strong>Ideal storage<\/strong> = total Used \u00f7 total capacity<\/p>\n<p>= 951\u00f71350 = 0.70 or 70% of capacity of each disk.<\/p>\n<p>Also, volume data density is equal to the difference between ideal-Storage and current dfsUsedRatio.<\/p>\n<p>Therefore, volume data density for disk1 is:<\/p>\n<p><strong>VolumeDataDensity\u00a0<\/strong>= idealStorage &#8211; dfs Used Ratio<\/p>\n<p>= 0.70-0.50 = 0.20<\/p>\n<p>A positive value for volumeDataDensity indicates that disk is under-utilized and, a negative value indicates that disk is over-utilized in relation to the current ideal storage target.<\/p>\n<p><strong>1.2. Node Data Density or Inter-Node Data Density<\/strong><\/p>\n<p>After calculating volume data density, we can calculate Node Data Density, which is the sum of all absolute values of volume data density.<\/p>\n<p>This allows comparing nodes that need our attention in a given cluster. Lower nodeDataDensity values indicate better spread, and higher values indicate more skewed data distribution.<\/p>\n<p><strong>1.3 Reports<\/strong><br \/>\nOnce we have volumeDataDensity and nodeDataDensity, we can find the top 20 nodes in the cluster that skewed data distribution, or we can get the volumeDataDensity for a given node.<\/p>\n<h4>2. Disk balancing<\/h4>\n<p>Once we know that a certain node needs balancing, we compute or read the current volumeDataDensity. With this information, we can easily decide which volumes are over-provisioned and which are under-provisioned. In order to move data from one volume to another in the DataNode, we would add a protocol based RPC similar to the one used by the balancer. Thus, allowing the user to replace disks without worrying about decommissioning a node.<\/p>\n<h3>[ps2id id=&#8217;Commands-Supported-by-Disk-Balancer&#8217; target=&#8221;\/]Commands Supported by HDFS Disk Balancer<\/h3>\n<p>Let us now see the various commands supported by the HDFS Disk Balancer.<\/p>\n<h4>1. plan<\/h4>\n<p><strong>HDFS diskbalancer plan command Usage:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">hdfs diskbalancer -plan &lt;datanode&gt;<\/pre>\n<p><strong>HDFS diskbalancer plan Description:<\/strong><\/p>\n<p>The <strong>plan<\/strong> command generates the plan for the specified DataNode. This command can be run against a given DataNode. There are some additional options that can be used with<strong> hdfs diskbalancer plan command<\/strong> that allows users to control the output and execution of a plan.<\/p>\n<table class=\"df-table-center\">\n<tbody>\n<tr>\n<td><b>Options<\/b><\/td>\n<td><b>Description<\/b><\/td>\n<\/tr>\n<tr>\n<td><b>-out<\/b><\/td>\n<td><span style=\"font-weight: 400\">controls the output location of a plan file<\/span><\/td>\n<\/tr>\n<tr>\n<td><b>-bandwidth<\/b><\/td>\n<td><span style=\"font-weight: 400\">It enables the user to set maximum bandwidth used for running the Disk Balancer. This option thus helps in limiting the amount of data moved by the Disk Balancer per second on an operational DataNode.<\/span><\/p>\n<p><span style=\"font-weight: 400\">This option is not necessary to be set because if it is not specified, then the disk balancer uses the default bandwidth of 10 MB\/s.<\/span><\/td>\n<\/tr>\n<tr>\n<td><b>&#8211;<\/b><b>thresholdPercentage<\/b><\/td>\n<td><span style=\"font-weight: 400\">This allows the user to set the thresholdPercentage, which defines the value at which disks start participating in the data redistribution or balancing operation. The default thresholdPercentage value is 10%, which means a disk is used in balancing operation only when the disk contains 10% more or less data then the ideal storage value.<\/span><\/td>\n<\/tr>\n<tr>\n<td><b>-maxerror<\/b><\/td>\n<td><span style=\"font-weight: 400\">It allows users to specify the number of errors to be ignored for a move operation between two disks before we abort a move step.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">If it is not specified, then the disk balancer uses the default value.<\/span><\/td>\n<\/tr>\n<tr>\n<td><b>-v<\/b><\/td>\n<td><span style=\"font-weight: 400\">Verbose mode, specifying this option forces the plan command to display a summary of the plan on stdout.<\/span><\/td>\n<\/tr>\n<tr>\n<td><b>-fs<\/b><\/td>\n<td><span style=\"font-weight: 400\">This option specifies the NameNode to use. If this is not specified, then\u00a0 Disk Balancer uses the default NameNode from the configuration.<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><strong>Example:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">hdfs diskbalancer -plan node1.mycluster.com<\/pre>\n<h4>2. execute<\/h4>\n<p><strong>HDFS diskbalancer execute command Usage:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">hdfs diskbalancer -execute &lt;JSON file path&gt;<\/pre>\n<p><strong>HDFS diskbalancer execute command Description:<\/strong><\/p>\n<p>The <strong>execute<\/strong> command executes the plan against the DataNode for which the plan was generated. The &lt;JSON File Path&gt; is the path to the JSON document, which contains the generated plan (nodename.plan.json).<\/p>\n<p><strong>Example:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">hdfs diskbalancer -execute \/system\/diskbalancer\/nodename.plan.json<\/pre>\n<h4>3. query<\/h4>\n<p><strong>HDFS diskbalancer query command Usage: <\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">hdfs diskbalancer -query &lt;datanode&gt;<\/pre>\n<p><strong>HDFS diskbalancer query command Description:<\/strong><\/p>\n<p>The <strong>query<\/strong> command gets the current status of the\u00a0 <a href=\"https:\/\/hadoop.apache.org\/docs\/r1.2.1\/hdfs_design.html\">HDFS<\/a> disk balancer from a DataNode for which the plan is running.<\/p>\n<p>Example:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">hdfs diskbalancer -query nodename.mycluster.com<\/pre>\n<h4>4. cancel<\/h4>\n<p><strong>HDFS diskbalancer cancel command Usage:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">hdfs diskbalancer -cancel &lt;JSON file path&gt;\r\nOR\r\nhdfs diskbalancer -cancel planID node &lt;nodename&gt;<\/pre>\n<p><strong>HDFS diskbalancer cancel command Description:<\/strong><\/p>\n<p>The <strong>cancel<\/strong> command cancels the running plan.<\/p>\n<p>The &lt;JSON file path&gt; is the path to the JSON document, which contains the generated plan.<\/p>\n<p>planID is the ID of the plan to cancel.<\/p>\n<p><strong>Example:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">hdfs diskbalancer -cancel \/system\/diskbalancer\/nodename.plan.json<\/pre>\n<h4>5. report<\/h4>\n<p><strong>HDFS diskbalancer report command Usage:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">hdfs diskbalancer -fs https:\/\/namenode.uri -report &lt;file:\/\/&gt;\r\nOR \r\nhdfs diskbalancer -fs https:\/\/namenode.uri -report [&lt;DataNodeID|IP|Hostname&gt;,...]\r\nOR\r\nhdfs diskbalancer -fs http:\/\/namenode.uri -report -top topnum<\/pre>\n<p><strong>HDFS diskbalancer report command Description:<\/strong><\/p>\n<p>The <strong>report<\/strong> command gives a detailed report of the specified DataNodes or top DataNodes that require a disk balancer. The DataNodes can be specified either by a host file or by the list of DataNodes separated by a comma.<\/p>\n<table class=\"df-table-center\">\n<tbody>\n<tr>\n<td><span style=\"font-weight: 400\">&lt;file:\/\/&gt;\u00a0<\/span><\/td>\n<td><span style=\"font-weight: 400\">specify the host file which lists the DataNodes for which you want to generate the reports.<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400\">[&lt;DataNodeID|IP|Hostname&gt;,..]<\/span><\/td>\n<td><span style=\"font-weight: 400\">specify the DataNodeID, IP of the\u00a0 DataNode and the Hostname of the DataNode for which you want to generate the report.<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 topnum<\/span><\/td>\n<td><span style=\"font-weight: 400\">specifies the number of top nodes that require a disk balancer.<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h3>[ps2id id=&#8217;Disk-Balancer-Settings&#8217; target=&#8221;\/]Hadoop Disk Balancer Settings<\/h3>\n<p>We can control some of the diskbalancer settings through the hdfs-site.xml file. Let us see some of the diskbalancer settings with their description.<\/p>\n<table class=\"df-table-center\">\n<tbody>\n<tr>\n<td><b>Setting<\/b><\/td>\n<td><b>Description<\/b><\/td>\n<\/tr>\n<tr>\n<td><b>dfs.disk.balancer.enabled<\/b><\/td>\n<td><span style=\"font-weight: 400\">Controls whether to enable the diskbalancer for a cluster or not. The default value is set to false, which indicates that the diskbalancer is disabled. The DataNodes will reject the execute command if this is not enabled.<\/span><span style=\"font-weight: 400\">\u00a0<\/span><\/td>\n<\/tr>\n<tr>\n<td><b>dfs.disk.balancer.max.disk.throughputInMBpersec<\/b><\/td>\n<td><span style=\"font-weight: 400\">This parameter controls the maximum bandwidth used by diskbalancer while balancing disk data. The default value is 10MB\/s.\u00a0<\/span><\/td>\n<\/tr>\n<tr>\n<td><b>dfs.disk.balancer.max.disk.errors<\/b><\/td>\n<td><span style=\"font-weight: 400\">This parameter sets the value of the maximum number of errors to be ignored for a move operation between two disks before we abort a move step. The default value for this is 5.\u00a0<\/span><\/td>\n<\/tr>\n<tr>\n<td><b>dfs.disk.balancer.block.tolerance.percent<\/b><\/td>\n<td><span style=\"font-weight: 400\">The tolerance percent specifies the difference threshold between the data storage capacity and current status of each disk during data balancing among disks.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">For example, if the ideal storage capacity of a disk is 1 Tb and the value of this parameter is set to 10. If the data storage capacity of target disk reaches to 900 GB, then disk storage status is considered as perfect.\u00a0<\/span><\/td>\n<\/tr>\n<tr>\n<td><b>dfs.disk.balancer.plan.threshold.percent<\/b><\/td>\n<td><span style=\"font-weight: 400\">This parameter controls the thresholdPercentage value for volume data density in a plan. If the absolute value of the volume data density of a disk is out of threshold value, it indicates data balancing is required.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">The default value is 10.<\/span><\/td>\n<\/tr>\n<tr>\n<td><b>dfs.disk.balancer.top.nodes.number<\/b><\/td>\n<td><span style=\"font-weight: 400\">This parameter specifies the top N nodes that require disk data balancing in a cluster.<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Conclusion<\/h2>\n<p>In short, we can conclude that a disk Balancer is a tool that distributes data blocks between the volumes of the DataNode.Thus, it addresses Intra-Node skew.<\/p>\n<p>Also, the disk balancer moves data from one volume to another within nodes while nodes are alive, so users can replace disks without having to worry about decommissioning a node.<\/p>\n<p>HDFS Disk Balancer support some of the commands like plan for generating a plan, execute for executing the plan against DataNode, query for querying the status of disk balancer, cancel to cancel the plan, etc.<\/p>\n<p>Now explore <a href=\"https:\/\/data-flair.training\/blogs\/hadoop-hdfs-federation-tutorial\/\"><strong>HDFS Federation feature<\/strong><\/a> introduced in Hadoop 2.0<\/p>\n<p>Still, if you have any query regarding HDFS Diskbalancer, ask in the comment section.<\/p>\n<p>Keep Learning!!<span hidden class=\"__iawmlf-post-loop-links\" data-iawmlf-links=\"[{&quot;id&quot;:2324,&quot;href&quot;:&quot;https:\\\/\\\/hadoop.apache.org\\\/docs\\\/r1.2.1\\\/hdfs_design.html&quot;,&quot;archived_href&quot;:&quot;http:\\\/\\\/web-wp.archive.org\\\/web\\\/20251004005724\\\/https:\\\/\\\/hadoop.apache.org\\\/docs\\\/r1.2.1\\\/hdfs_design.html&quot;,&quot;redirect_href&quot;:&quot;&quot;,&quot;checks&quot;:[{&quot;date&quot;:&quot;2025-12-11 03:56:23&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-14 09:53:08&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-17 12:20:57&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-20 15:28:27&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-23 15:31:00&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-27 07:03:50&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-30 07:03:59&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-02 08:16:19&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-05 09:48:00&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-08 13:08:28&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-11 13:53:21&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-14 18:46:44&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-17 23:13:23&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-21 02:11:09&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-24 03:50:32&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-27 04:47:32&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-30 05:44:37&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-02 06:39:48&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-05 07:42:36&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-08 11:08:20&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-11 13:16:33&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-14 14:41:19&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-17 17:07:04&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-20 18:54:13&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-24 07:43:58&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-27 12:00:15&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-02 13:35:05&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-05 13:50:35&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-09 02:43:14&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-12 23:35:20&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-16 01:39:23&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-19 15:11:57&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-23 04:19:36&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-26 04:52:31&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-29 13:58:44&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-01 15:49:29&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-04 17:03:01&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-08 05:34:46&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-11 06:40:00&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-14 10:12:16&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-17 13:15:34&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-20 16:19:16&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-23 17:33:22&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-27 06:17:27&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-30 06:23:47&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-03 10:14:11&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-06 16:08:00&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-10 08:28:08&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-13 11:44:24&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-17 06:06:54&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-20 08:02:43&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-23 10:14:23&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-26 10:17:26&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-29 12:21:21&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-01 13:42:21&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-04 17:49:02&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-08 03:17:13&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-11 06:58:21&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-14 10:13:41&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-17 11:43:34&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-21 02:33:48&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-24 08:21:36&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-27 10:49:33&quot;,&quot;http_code&quot;:206}],&quot;broken&quot;:false,&quot;last_checked&quot;:{&quot;date&quot;:&quot;2026-06-27 10:49:33&quot;,&quot;http_code&quot;:206},&quot;process&quot;:&quot;done&quot;}]\"><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Disk Balancer is a command-line tool introduced in Hadoop 3 for balancing the disks within the DataNode.\u00a0HDFS diskbalancer is different from the HDFS Balancer, which balances the distribution across the nodes. In this article,&#46;&#46;&#46;<\/p>\n","protected":false},"author":7,"featured_media":76202,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[25],"tags":[21953,5186,21954,21952,5548,21951,5576],"class_list":["post-2726","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-hdfs","tag-disk-balancer","tag-hadoop","tag-hadoop-datanode","tag-hadoop-disk-balancer","tag-hdfs","tag-hdfs-diskbalancer","tag-hdfs-intra-data-node-balancer"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>HDFS Disk Balancer - Learn how to Balance Data on DataNode - DataFlair<\/title>\n<meta name=\"description\" content=\"Hadoop HDFS Disk Balancer tutorial - learn balancing of data on DataNode, Intra-data node balancer in Hadoop Diskbalancer, operation &amp; abilities oh HDFS balancer\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/hadoop-hdfs-disk-balancer\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"HDFS Disk Balancer - Learn how to Balance Data on DataNode - DataFlair\" \/>\n<meta property=\"og:description\" content=\"Hadoop HDFS Disk Balancer tutorial - learn balancing of data on DataNode, Intra-data node balancer in Hadoop Diskbalancer, operation &amp; abilities oh HDFS balancer\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/hadoop-hdfs-disk-balancer\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2017-06-05T10:49:54+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-08-25T17:03:25+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/06\/hadoop-hdfs-disk-balancer-2.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"802\" \/>\n\t<meta property=\"og:image:height\" content=\"420\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"HDFS Disk Balancer - Learn how to Balance Data on DataNode - DataFlair","description":"Hadoop HDFS Disk Balancer tutorial - learn balancing of data on DataNode, Intra-data node balancer in Hadoop Diskbalancer, operation & abilities oh HDFS balancer","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/hadoop-hdfs-disk-balancer\/","og_locale":"en_US","og_type":"article","og_title":"HDFS Disk Balancer - Learn how to Balance Data on DataNode - DataFlair","og_description":"Hadoop HDFS Disk Balancer tutorial - learn balancing of data on DataNode, Intra-data node balancer in Hadoop Diskbalancer, operation & abilities oh HDFS balancer","og_url":"https:\/\/data-flair.training\/blogs\/hadoop-hdfs-disk-balancer\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2017-06-05T10:49:54+00:00","article_modified_time":"2021-08-25T17:03:25+00:00","og_image":[{"width":802,"height":420,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/06\/hadoop-hdfs-disk-balancer-2.jpg","type":"image\/jpeg"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/hadoop-hdfs-disk-balancer\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/hadoop-hdfs-disk-balancer\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/beb0cab24b7aa54423a3b50e669a9dcd"},"headline":"HDFS Disk Balancer &#8211; Learn how to Balance Data on DataNode","datePublished":"2017-06-05T10:49:54+00:00","dateModified":"2021-08-25T17:03:25+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/hadoop-hdfs-disk-balancer\/"},"wordCount":1840,"commentCount":4,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/hadoop-hdfs-disk-balancer\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/06\/hadoop-hdfs-disk-balancer-2.jpg","keywords":["disk balancer","hadoop","Hadoop DataNode","hadoop Disk Balancer","hdfs","HDFS diskbalancer","HDFS intra data node balancer"],"articleSection":["HDFS Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/hadoop-hdfs-disk-balancer\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/hadoop-hdfs-disk-balancer\/","url":"https:\/\/data-flair.training\/blogs\/hadoop-hdfs-disk-balancer\/","name":"HDFS Disk Balancer - Learn how to Balance Data on DataNode - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/hadoop-hdfs-disk-balancer\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/hadoop-hdfs-disk-balancer\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/06\/hadoop-hdfs-disk-balancer-2.jpg","datePublished":"2017-06-05T10:49:54+00:00","dateModified":"2021-08-25T17:03:25+00:00","description":"Hadoop HDFS Disk Balancer tutorial - learn balancing of data on DataNode, Intra-data node balancer in Hadoop Diskbalancer, operation & abilities oh HDFS balancer","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/hadoop-hdfs-disk-balancer\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/hadoop-hdfs-disk-balancer\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/hadoop-hdfs-disk-balancer\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/06\/hadoop-hdfs-disk-balancer-2.jpg","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/06\/hadoop-hdfs-disk-balancer-2.jpg","width":802,"height":420,"caption":"HDFS diskbalancer"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/hadoop-hdfs-disk-balancer\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"HDFS Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/hdfs\/"},{"@type":"ListItem","position":3,"name":"HDFS Disk Balancer &#8211; Learn how to Balance Data on DataNode"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/beb0cab24b7aa54423a3b50e669a9dcd","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/c322416204232f4dd97ef3901b0a499a5d34d7ba7fe333f4bfe53a907873d293?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/c322416204232f4dd97ef3901b0a499a5d34d7ba7fe333f4bfe53a907873d293?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/c322416204232f4dd97ef3901b0a499a5d34d7ba7fe333f4bfe53a907873d293?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"DataFlair Team specializes in creating clear, actionable content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Backed by industry expertise, we make learning easy and career-oriented for beginners and pros alike.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam3\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/2726","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=2726"}],"version-history":[{"count":8,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/2726\/revisions"}],"predecessor-version":[{"id":76210,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/2726\/revisions\/76210"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/76202"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=2726"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=2726"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=2726"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}