how to change the replication factor for existing data already present in HDFS?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 11:46 am #4690
  
  DataFlair Team
  Spectator
  
  A File is already loaded to HDFS. Now the replication Factor of it needs to be changed, how to change this?
- September 20, 2018 at 11:46 am #4691
  DataFlair Team
  Spectator
  If the file was loaded into HDFS with a default Replication Factor of 3, which is set in hdfs-site.xml. The replication of that particular file would be 3, which means 3 copies of the block exists on the HDFS.
  
  Now, if we want to change the replication factor of the existing content in HDFS, which in our case is set to 4.
  - we can change the dfs.replication value to 4 in $HADOOP_HOME/conf/hadoop-site.xml file. Which will start replicating to the factor of 4 for any new content that comes in.
  - If we are looking to change for a specific file or a Directory, you can use the below commands to do that.
    To set replication of an individual file to 4:
```
$HADOOP_HOME/bin/hadoop dfs -setrep -w 4 /path of the file
```
  - we can also do this on a Directory, which will change for all the files under it recursively.
    To change replication of entire directory under HDFS to 4:
```
./bin/hadoop dfs -setrep -R -w 4 /Directory path
```
  – this is specific to a directory which we mention and if we give / (root)then it would do for all the files under it.
  
  For more details, please follow: HDFS Tutorial
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.