Where does Hadoop store its data?

Viewing 2 reply threads
  • Author
    Posts
    • #5658
      DataFlair TeamDataFlair Team
      Spectator

      What is the storage system of Hadoop?
      Does it store the data in database or local filesystem?

    • #5659
      DataFlair TeamDataFlair Team
      Spectator

      Hadoop stores data in HDFS- Hadoop Distributed FileSystem.
      HDFS is the primary storage system of Hadoop which stores very large files running on the cluster of commodity hardware. It works on the principle of storage of less number of large files rather than the huge number of small files. It stores data reliably even in the case of hardware failure.

      In HDFS data is stored in Blocks, Block is the smallest unit of data that the file system stores. Files are broken into blocks that are distributed across the cluster on the basis of replication factor. The default replication factor is 3, thus each block is replicated 3 times. The first replica is stored on the first datanode. The second replica is stored on another datanode within the same rack to minimize the cross talk and third is stored on datanode in different racks, ensuring that even if rack fails the data is not lost.

      Namenode keeps the information of blocks like number of blocks, their replicas, and other details. While Datanode stores actual data and performs various operations like block creation, deletion and replication according to instruction of Namenode

      In hdfs-site.xml
      <property>
      <name>dfs.namenode.name.dir</name>
      <value>file:/hadoop/hdfs/namenode</value>
      </property>

      <property>
      <name>dfs.datanode.data.dir</name>
      <value>file:/hadoop/hdfs/datanode</value>
      </property>
      dfs.datanode.data.dir determines where datanode should store its blocks.

      Follow the link to learn more about HDFS in Hadoop

    • #5660
      DataFlair TeamDataFlair Team
      Spectator

      1) Hadoop stores data in Hadoop data store called as ‘Hadoop Distributed File System(HDFS). Unlike traditional file system, it is a distributed file system which stores data distributedly across set of clusters
      of commodity hardware which consists of NameNode and DataNodes.

      2) DataNodes stores actual physical data in the form of chunks in the Blocks. Block size by default is 128MB which can be further extended to 256MB, 512MB as per the requirement.

      3) NameNode which is a master node in HDFS stores Metadata of the data stored on DataNodes in the memory. Metadata is in memory, it serves information about the data efficiently and fast to the client.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.