Change block size of data already available in HDFS?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop Change block size of data already available in HDFS?

Viewing 2 reply threads
  • Author
    Posts
    • #6054
      DataFlair TeamDataFlair Team
      Spectator

      When we write file in HDFS, it is splitted into blocks, block size would be same as specified in the global configuration file hdfs-site.xml

      Now once the data is already written in HDFS, How to change the block size of data that is already available in HDFS?

    • #6057
      DataFlair TeamDataFlair Team
      Spectator

      In Hadoop HDFS, block size is specified in conf. file – hdfs-site.xml. To change the block size, parameter, dfs.block.size can be changed to required value(default 64mb/128mb) in hdfs-site.xml file. Once this is changed, cluster restart is required for the change to effect, for which will be applied only to the new files.
      Existing files’ block size doesn’t change. In order to change the existing files’ block size, ‘distcp’ utility can be used.

      DistCp (distributed copy) is a tool used for large inter/intra-cluster copying. It uses MapReduce to effect its
      distribution, error handling and recovery and reporting.

      It will copy the files with the new block size to the new location. However, we have to manually delete the old files with the older block size.
      Command: hadoop distcp –
      Ddfs.block.size=XX /path/to/inputdata(Source) /path/to/inputdata-with-largeblocks(Destination).

    • #6058
      DataFlair TeamDataFlair Team
      Spectator

      To configure the data block at Cluster level we need to specify in the hdfs.site.xml, eg. For 128 mb,value will be 128*1024*1024
      <property>
      <name>dfs.block.size</name>
      <value>134217728</value>
      <description>Block size</description>
      </property>
      For multinode, in a Cluster. we need to update the same in the node(Name Node and Data Node) and restart the daemons.

      This change doesn’t affect the existing files in Hadoop HDFS.

      To change a block size for specific file in a cluster:
      hadoop fs -Ddfs.blocksize=134217728 -put /home/hdadmin/mydata/test.text /input

Viewing 2 reply threads
  • You must be logged in to reply to this topic.