Where does Hadoop store its data?

This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 3:52 pm #5658
  
  DataFlair Team
  Spectator
  
  What is the storage system of Hadoop?
  Does it store the data in database or local filesystem?
- September 20, 2018 at 3:53 pm #5659
  
  DataFlair Team
  Spectator
  
  Hadoop stores data in HDFS- Hadoop Distributed FileSystem.
  HDFS is the primary storage system of Hadoop which stores very large files running on the cluster of commodity hardware. It works on the principle of storage of less number of large files rather than the huge number of small files. It stores data reliably even in the case of hardware failure.
  
  In HDFS data is stored in Blocks, Block is the smallest unit of data that the file system stores. Files are broken into blocks that are distributed across the cluster on the basis of replication factor. The default replication factor is 3, thus each block is replicated 3 times. The first replica is stored on the first datanode. The second replica is stored on another datanode within the same rack to minimize the cross talk and third is stored on datanode in different racks, ensuring that even if rack fails the data is not lost.
  
  Namenode keeps the information of blocks like number of blocks, their replicas, and other details. While Datanode stores actual data and performs various operations like block creation, deletion and replication according to instruction of Namenode
  
  In hdfs-site.xml
  <property>
  <name>dfs.namenode.name.dir</name>
  <value>file:/hadoop/hdfs/namenode</value>
  </property>
  
  <property>
  <name>dfs.datanode.data.dir</name>
  <value>file:/hadoop/hdfs/datanode</value>
  </property>
  dfs.datanode.data.dir determines where datanode should store its blocks.
  
  Follow the link to learn more about HDFS in Hadoop
- September 20, 2018 at 3:53 pm #5660
  
  DataFlair Team
  Spectator
  
  1) Hadoop stores data in Hadoop data store called as ‘Hadoop Distributed File System(HDFS). Unlike traditional file system, it is a distributed file system which stores data distributedly across set of clusters
  of commodity hardware which consists of NameNode and DataNodes.
  
  2) DataNodes stores actual physical data in the form of chunks in the Blocks. Block size by default is 128MB which can be further extended to 256MB, 512MB as per the requirement.
  
  3) NameNode which is a master node in HDFS stores Metadata of the data stored on DataNodes in the memory. Metadata is in memory, it serves information about the data efficiently and fast to the client.
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

Where does Hadoop store its data?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses