How does HDFS ensure Data Integrity of data blocks stored in Hadoop HDFS?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 5:07 pm #6096
  
  DataFlair Team
  Spectator
  
  How is Data Integrity achieved in HDFS?
- September 20, 2018 at 5:07 pm #6099
  
  DataFlair Team
  Spectator
  
  Data Integrity in Hadoop is achieved by maintaining the checksum of the data written to the block.
  
  Whenever data is written to HDFS blocks , HDFS calculate the checksum for all data written and verify checksum when it will read that data. The seperate checksum will create for every dfs.bytes.per.checksum bytes of data. The default size for this property is 512 bytes. Checksum is 4 Byte long.
  
  All datanodes are responsible to check checksum of their data. When client read data from checksum, they also check checksum. To check the data block datanodes runs a DataBlockScanner periodically to verify Block. So if corrupt data found HDFS will take replica of actual data and replace the corrupt one.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.