How to specify more than one path for storage in Hadoop

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop How to specify more than one path for storage in Hadoop

Viewing 4 reply threads
  • Author
    Posts
    • #6355
      DataFlair TeamDataFlair Team
      Spectator

      I have mounted several disks as /data1, /data2, /data3… in the Hadoop slave nodes.
      We can specify one path for storage in hdfs-site.xml / core-site.xml. But how to tell Hadoop to store the data in all the disks ?

    • #6356
      DataFlair TeamDataFlair Team
      Spectator

      Below parameter in hdfs-default.xml defines the location where data will be saved in particular datanode. If you want more than one directory then please specify another directory path using the comma delimited.

      dfs.datanode.data.dir
      file://${hadoop.tmp.dir}/dfs/data
      Determines where on the local filesystem a DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices.Directories that do not exist are ignored.

      For more details, please follow: Cloudera Hadoop CDH5 Installation On Ubuntu

    • #6357
      DataFlair TeamDataFlair Team
      Spectator

      In the file /etc/hadoop/conf/hdfs-site.xml – set the property dfs.data.dir,
      which

      “Determines where on the local file system a DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.”

      These are the directories where the real data bytes of HDFS will be written to. If you specify multiple directories the DataNode will write to them in turn which gives good performance when reading the data.

      dfs.data.dir
      /mnt/disk1/hadoop/dfs/data,/mnt/disk2/hadoop/dfs/data
      For more details, please follow: Installation of Hadoop 3.x on ubuntu

    • #6359
      DataFlair TeamDataFlair Team
      Spectator

      The parameter to specify more than one path for storage in Hadoop is in hdfs-site.xml configuration file where the property name is dfs.datanode.data.dir.
      <property>
      <name>dfs.datanode.data.dir</name>
      <value>/user1/hadoop/data,/user2/hadoop/data
      </value>
      </property>
      dfs.datanode.data.dir value can be any directory which is available on the datanode.It determines where on the local filesystem data node should store its blocks.

      It can be a directory where disk partitions are mounted like ‘/user1/hadoop/data, /user2/hadoop/data’ which is in case if you have multiple disks partitions to be used for HDFS the purpose. When it has multiple values, data is copied to the HDFS in a round-robin fashion. If one of the directory’s disk is full, round-robin data copy will continue on the rest of the directories.

      For more details, please follow: Installation of Hadoop 2.7.x on ubuntu

    • #6361
      DataFlair TeamDataFlair Team
      Spectator

      The parameter to specify more than one path for storage in Hadoop is in hdfs-site.xml configuration file.

      Check this property in hfs-site.xml file

      dfs.datanode.data.dir

      Now, this value can be any directory which is available on the datanode (Slave’s local disk).
      So this value will define the path where datanode should save the Block.

      Also considering a situation , suppose we have multiple values i.e multiple disks partitions , know as JBOD configuration, we can also setup this configuration by specifying the parameters in hfs-site.xml.

Viewing 4 reply threads
  • You must be logged in to reply to this topic.