Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › How to specify more than one path for storage in Hadoop
- This topic has 4 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 5:49 pm #6355DataFlair TeamSpectator
I have mounted several disks as /data1, /data2, /data3… in the Hadoop slave nodes.
We can specify one path for storage in hdfs-site.xml / core-site.xml. But how to tell Hadoop to store the data in all the disks ? -
September 20, 2018 at 5:49 pm #6356DataFlair TeamSpectator
Below parameter in hdfs-default.xml defines the location where data will be saved in particular datanode. If you want more than one directory then please specify another directory path using the comma delimited.
dfs.datanode.data.dir
file://${hadoop.tmp.dir}/dfs/data
Determines where on the local filesystem a DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices.Directories that do not exist are ignored.For more details, please follow: Cloudera Hadoop CDH5 Installation On Ubuntu
-
September 20, 2018 at 5:50 pm #6357DataFlair TeamSpectator
In the file /etc/hadoop/conf/hdfs-site.xml – set the property dfs.data.dir,
which“Determines where on the local file system a DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.”
These are the directories where the real data bytes of HDFS will be written to. If you specify multiple directories the DataNode will write to them in turn which gives good performance when reading the data.
dfs.data.dir
/mnt/disk1/hadoop/dfs/data,/mnt/disk2/hadoop/dfs/data
For more details, please follow: Installation of Hadoop 3.x on ubuntu -
September 20, 2018 at 5:50 pm #6359DataFlair TeamSpectator
The parameter to specify more than one path for storage in Hadoop is in hdfs-site.xml configuration file where the property name is dfs.datanode.data.dir.
<property>
<name>dfs.datanode.data.dir</name>
<value>/user1/hadoop/data,/user2/hadoop/data
</value>
</property>
dfs.datanode.data.dir value can be any directory which is available on the datanode.It determines where on the local filesystem data node should store its blocks.It can be a directory where disk partitions are mounted like ‘/user1/hadoop/data, /user2/hadoop/data’ which is in case if you have multiple disks partitions to be used for HDFS the purpose. When it has multiple values, data is copied to the HDFS in a round-robin fashion. If one of the directory’s disk is full, round-robin data copy will continue on the rest of the directories.
For more details, please follow: Installation of Hadoop 2.7.x on ubuntu
-
September 20, 2018 at 5:50 pm #6361DataFlair TeamSpectator
The parameter to specify more than one path for storage in Hadoop is in hdfs-site.xml configuration file.
Check this property in hfs-site.xml file
dfs.datanode.data.dir
Now, this value can be any directory which is available on the datanode (Slave’s local disk).
So this value will define the path where datanode should save the Block.Also considering a situation , suppose we have multiple values i.e multiple disks partitions , know as JBOD configuration, we can also setup this configuration by specifying the parameters in hfs-site.xml.
-
-
AuthorPosts
- You must be logged in to reply to this topic.