How does HDFS ensure Data Integrity of data blocks stored in Hadoop HDFS?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop How does HDFS ensure Data Integrity of data blocks stored in Hadoop HDFS?

Viewing 1 reply thread
  • Author
    Posts
    • #6096
      DataFlair TeamDataFlair Team
      Spectator

      How is Data Integrity achieved in HDFS?

    • #6099
      DataFlair TeamDataFlair Team
      Spectator

      Data Integrity in Hadoop is achieved by maintaining the checksum of the data written to the block.

      Whenever data is written to HDFS blocks , HDFS calculate the checksum for all data written and verify checksum when it will read that data. The seperate checksum will create for every dfs.bytes.per.checksum bytes of data. The default size for this property is 512 bytes. Checksum is 4 Byte long.

      All datanodes are responsible to check checksum of their data. When client read data from checksum, they also check checksum. To check the data block datanodes runs a DataBlockScanner periodically to verify Block. So if corrupt data found HDFS will take replica of actual data and replace the corrupt one.

Viewing 1 reply thread
  • You must be logged in to reply to this topic.