how to change the replication factor for existing data already present in HDFS?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop how to change the replication factor for existing data already present in HDFS?

Viewing 1 reply thread
  • Author
    Posts
    • #4690
      DataFlair TeamDataFlair Team
      Spectator

      A File is already loaded to HDFS. Now the replication Factor of it needs to be changed, how to change this?

    • #4691
      DataFlair TeamDataFlair Team
      Spectator

      If the file was loaded into HDFS with a default Replication Factor of 3, which is set in hdfs-site.xml. The replication of that particular file would be 3, which means 3 copies of the block exists on the HDFS.

      Now, if we want to change the replication factor of the existing content in HDFS, which in our case is set to 4.

      • we can change the dfs.replication value to 4 in $HADOOP_HOME/conf/hadoop-site.xml file. Which will start replicating to the factor of 4 for any new content that comes in.
      • If we are looking to change for a specific file or a Directory, you can use the below commands to do that.
        To set replication of an individual file to 4:
      $HADOOP_HOME/bin/hadoop dfs -setrep -w 4 /path of the file
      • we can also do this on a Directory, which will change for all the files under it recursively.
        To change replication of entire directory under HDFS to 4:
      ./bin/hadoop dfs -setrep -R -w 4 /Directory path

      – this is specific to a directory which we mention and if we give / (root)then it would do for all the files under it.

      For more details, please follow: HDFS Tutorial

Viewing 1 reply thread
  • You must be logged in to reply to this topic.