How does HDFS ensure Data Integrity of data blocks stored in Hadoop HDFS?
-
-
How is Data Integrity achieved in HDFS?
-
Data Integrity in Hadoop is achieved by maintaining the checksum of the data written to the block.
Whenever data is written to HDFS blocks , HDFS calculate the checksum for all data written and verify checksum when it will read that data. The seperate checksum will create for every dfs.bytes.per.checksum bytes of data. The default size for this property is 512 bytes. Checksum is 4 Byte long.
All datanodes are responsible to check checksum of their data. When client read data from checksum, they also check checksum. To check the data block datanodes runs a DataBlockScanner periodically to verify Block. So if corrupt data found HDFS will take replica of actual data and replace the corrupt one.
- You must be logged in to reply to this topic.