This topic contains 4 replies, has 1 voice, and was last updated by  dfbdteam3 1 year, 1 month ago.

Viewing 5 posts - 1 through 5 (of 5 total)
  • Author
    Posts
  • #6355

    dfbdteam3
    Moderator

    I have mounted several disks as /data1, /data2, /data3… in the Hadoop slave nodes.
    We can specify one path for storage in hdfs-site.xml / core-site.xml. But how to tell Hadoop to store the data in all the disks ?

    #6356

    dfbdteam3
    Moderator

    Below parameter in hdfs-default.xml defines the location where data will be saved in particular datanode. If you want more than one directory then please specify another directory path using the comma delimited.

    dfs.datanode.data.dir
    file://${hadoop.tmp.dir}/dfs/data
    Determines where on the local filesystem a DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices.Directories that do not exist are ignored.

    For more details, please follow: Cloudera Hadoop CDH5 Installation On Ubuntu

    #6357

    dfbdteam3
    Moderator

    In the file /etc/hadoop/conf/hdfs-site.xml – set the property dfs.data.dir,
    which

    “Determines where on the local file system a DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.”

    These are the directories where the real data bytes of HDFS will be written to. If you specify multiple directories the DataNode will write to them in turn which gives good performance when reading the data.

    dfs.data.dir
    /mnt/disk1/hadoop/dfs/data,/mnt/disk2/hadoop/dfs/data
    For more details, please follow: Installation of Hadoop 3.x on ubuntu

    #6359

    dfbdteam3
    Moderator

    The parameter to specify more than one path for storage in Hadoop is in hdfs-site.xml configuration file where the property name is dfs.datanode.data.dir.
    <property>
    <name>dfs.datanode.data.dir</name>
    <value>/user1/hadoop/data,/user2/hadoop/data
    </value>
    </property>
    dfs.datanode.data.dir value can be any directory which is available on the datanode.It determines where on the local filesystem data node should store its blocks.

    It can be a directory where disk partitions are mounted like ‘/user1/hadoop/data, /user2/hadoop/data’ which is in case if you have multiple disks partitions to be used for HDFS the purpose. When it has multiple values, data is copied to the HDFS in a round-robin fashion. If one of the directory’s disk is full, round-robin data copy will continue on the rest of the directories.

    For more details, please follow: Installation of Hadoop 2.7.x on ubuntu

    #6361

    dfbdteam3
    Moderator

    The parameter to specify more than one path for storage in Hadoop is in hdfs-site.xml configuration file.

    Check this property in hfs-site.xml file

    dfs.datanode.data.dir

    Now, this value can be any directory which is available on the datanode (Slave’s local disk).
    So this value will define the path where datanode should save the Block.

    Also considering a situation , suppose we have multiple values i.e multiple disks partitions , know as JBOD configuration, we can also setup this configuration by specifying the parameters in hfs-site.xml.

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic.