How to specify more than one path for storage in Hadoop

This topic has 4 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 4 reply threads

Author

Posts
- September 20, 2018 at 5:49 pm #6355
  
  DataFlair Team
  Spectator
  
  I have mounted several disks as /data1, /data2, /data3… in the Hadoop slave nodes.
  We can specify one path for storage in hdfs-site.xml / core-site.xml. But how to tell Hadoop to store the data in all the disks ?
- September 20, 2018 at 5:49 pm #6356
  
  DataFlair Team
  Spectator
  
  Below parameter in hdfs-default.xml defines the location where data will be saved in particular datanode. If you want more than one directory then please specify another directory path using the comma delimited.
  
  dfs.datanode.data.dir
  file://${hadoop.tmp.dir}/dfs/data
  Determines where on the local filesystem a DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices.Directories that do not exist are ignored.
  
  For more details, please follow: Cloudera Hadoop CDH5 Installation On Ubuntu
- September 20, 2018 at 5:50 pm #6357
  
  DataFlair Team
  Spectator
  
  In the file /etc/hadoop/conf/hdfs-site.xml – set the property dfs.data.dir,
  which
  
  “Determines where on the local file system a DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.”
  
  These are the directories where the real data bytes of HDFS will be written to. If you specify multiple directories the DataNode will write to them in turn which gives good performance when reading the data.
  
  dfs.data.dir
  /mnt/disk1/hadoop/dfs/data,/mnt/disk2/hadoop/dfs/data
  For more details, please follow: Installation of Hadoop 3.x on ubuntu
- September 20, 2018 at 5:50 pm #6359
  
  DataFlair Team
  Spectator
  
  The parameter to specify more than one path for storage in Hadoop is in hdfs-site.xml configuration file where the property name is dfs.datanode.data.dir.
  <property>
  <name>dfs.datanode.data.dir</name>
  <value>/user1/hadoop/data,/user2/hadoop/data
  </value>
  </property>
  dfs.datanode.data.dir value can be any directory which is available on the datanode.It determines where on the local filesystem data node should store its blocks.
  
  It can be a directory where disk partitions are mounted like ‘/user1/hadoop/data, /user2/hadoop/data’ which is in case if you have multiple disks partitions to be used for HDFS the purpose. When it has multiple values, data is copied to the HDFS in a round-robin fashion. If one of the directory’s disk is full, round-robin data copy will continue on the rest of the directories.
  
  For more details, please follow: Installation of Hadoop 2.7.x on ubuntu
- September 20, 2018 at 5:50 pm #6361
  
  DataFlair Team
  Spectator
  
  The parameter to specify more than one path for storage in Hadoop is in hdfs-site.xml configuration file.
  
  Check this property in hfs-site.xml file
  
  dfs.datanode.data.dir
  
  Now, this value can be any directory which is available on the datanode (Slave’s local disk).
  So this value will define the path where datanode should save the Block.
  
  Also considering a situation , suppose we have multiple values i.e multiple disks partitions , know as JBOD configuration, we can also setup this configuration by specifying the parameters in hfs-site.xml.
Author

Posts

Viewing 4 reply threads

You must be logged in to reply to this topic.

How to specify more than one path for storage in Hadoop

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses