Hadoop HDFS Erasure coding has overcome the limitation of 3x replication schema. It provides the same level of fault-tolerance with much less storage space. Storage space is reduced to 50% in Erasure coding. This HDFS erasure coding tutorial briefs about the advantages of erasure coding in Hadoop HDFS and how it saves huge disk space. We will also discuss what is erasure coding, design decision, and architecture, internal working of erasure coding.
2. Problem with old scheme of Replication
HDFS (storage layer of Hadoop) by default replicates each block three times for a number of purposes. It is the most simple and robust form of Redundancy to shield against the failure of Datanode. It also eases MapReduce task or computing task using locally store multiple Replicas of each block. But Replication is very expensive and 3x replication scheme has 200% overhead in storage space and other resources. However, datasets with low I/O activity, addition replicas are rarely accessed during normal operation but still consume other resources.
Therefore, a new feature- Erasure Coding is to use in the place of Replication, which provides the same level of fault tolerance with less space store and 50% storage overhead.
When comparing different storage scheme, an important consideration is:
- Data durability (number of simultaneously fault tolerance)
- Storage efficiency
So in N-way replication, there is N-1 fault tolerance with 1/n storage efficiency.
3. What is HDFS Erasure Coding in Hadoop?
HDFS Erasure Coding uses RAID (Redundant Array of Inexpensive Disks). RAID implements EC using stripping which logically stores the data in the form of a block (small unit) and stores the block on the different disk. For each block (cell) there will be parity calculated and stored. This is called encoding and the error can be recovered through the parity.
Erasure coding extends message with redundant data for fault tolerance. An EC codec operates on uniformly sized data cells. The codec takes a number of data cells as input and produces parity cells as the output. This process is called Encoding. Data cells and parity cell together are called an erasure coding group. Lost data cell can be reconstructed over the remaining cells this process is called decoding.
There are two algorithms available for HDFS Erasure Coding-
- XOR Algorithm – The simplest implementation of Erasure coding is XOR operation. Let’s assume X and Y and Z are data cell then parity cell is xor of these three data cells x ⊕ y ⊕ z so in XOR operation only one parity bit is generated and if anyone bit is lost it can be recovered by the remaining data cells and a parity bit. It is very limited since it produces 1 parity bit so XOR operation can tolerate only 1 failure with n group size.
“In Xor operation fault tolerance 1 and storage efficiency is n-1/n when group size is n.”
- Reed-Solomon Algorithm– XOR operation limitation is addressed by Reed-Solomon another form of EC. It uses linear algebra to generate multiple parity cells. RS uses two parameter k and m, k is a number of data cells and m is a number of parity cells. RS works by multiplying k data cells with a Generator Matrix (GT) to generate extended codeword with k data cells and m parity cells. Storage failure can be recovered by the multiplying inverse of generator matrix with the extended codewords as long as k out of k+m cells are available.
“In Reed, Solomon fault tolerance is up to m cells and storage efficiency is k/k+m where k are data cells and m are parity cells.”
4. Design Decision and Architecture
- Encoding is done offline. Initially, data blocks are still replicated. When the designated condition is met then it will change into ensuring coded form.
- Basic EC unit blocks. The encoded group is formed from all block to save the space for small files. Parity bits are also stored in the block. Parity data consume the same amount of space. However total space consumption is lower than the replication.
- Codec processing is done on the DataNode layer instead of HDFS. So client only read it’s on data.
To better support of small files, EC support stripping. In the future, HDFS will also support a contiguous EC layout. There are many new components are added by EC.
- Namenode Extension (ECManager) – Striped HDFS files contains a certain number of internal blocks. To reduce Namenode space consumption from these additional blocks, a new hierarchical block naming protocol was introduced. The ID of a block group can be inferred from the ID of any of its internal blocks. This allows management at the level of the block group rather than the block.
- Client Extension (ECClient) – The client can perform read and write operation on multiple internal blocks of a block group in parallel. On the output / write path.
- Datanode Extension (ECWorker) – Datanode runs an additional ECWorker task for recovery of a failed erasure coded block. Failed erasure coded block is noticed by Namenode which gives the recovery instruction to datanode. Recovery task passed as a heartbeat response this process is similar how replicated blocks are re-replicated on failure.
Reconstruction performs three task:
- Read the data from the source node.
- Decode the data and generate output data.
- Transfer generated blocks to target node
5. Working of Hadoop Erasure Coding
ECManager resides on the Namenode and coordinates the entire encoding and decoding task. Whenever a block is decoded or encoded, ECWorker on the chosen datanode all over the data from datanode and carries out its computation. ECManager gives the instruction to ECManager and provides the Codec scheme (EX. Generate parity bit for X, Y, Z data cells using Reed-Solomon algorithm). ECClient is the extension of HDFS client which notifies ECManager about the missing blocks and read the reconstructed data from ECWorker.
6. Advantages of Erasure Coding in Hadoop
- Saving substantial space – Initially, blocks are triplicated when they are sealed (no longer modified), a background task encode it and delete it replicas.
- Flexible policy – User and admin able to flag the file hot and cold. Hot files are replicated even after it sealed.
- Fast Recovery – HDFS block errors are discovered and recovered both actively (in the background) and passively (on the read path).
- Low overhead – Because of parity bit overhead is up to 50%.
- Transparency/compatibility – HDFS user should be able to use all basic and advanced features on erasure coded data, including snapshot, encryption, appending, caching and so forth.
In conclusion to HDFS Erasure coding, we can say that Erasure coding is introduced to reduce the storage overhead by using parity cell and provide fault tolerance. Erasure Coding uses RAID. It extends message with redundant data for fault tolerance.
Hope you liked this blog if you have any query, so please let us know by leaving a comment in a section given below.