HBase Operations: Read and Write Operations
Today, in this HBase article “HBase Operations: Read and Write” we will learn the whole concept of HBase. There are two basic Operations of HBase i.e. HBase read and HBase write.
Moreover, in this HBase tutorial, we will see some major components of HBase Operations such as HFile, META table.
So let’s start HBase Operations.
HBase Operations: Read and Write
Basically, in both data read and write operation of HBase, there are two major components which play a vital role in it, like HFile and META Table, so let’s study about both in detail:
Stay updated with latest technology trends
Join DataFlair on Telegram!!
A basic level HBase architecture where the tables exist in physical form is what we call HFile.
Some key points in HFile:
- A primary identifier is a Row key.
- Here in lexicographical order, keys are stored.
- Data is stored and split across the nodes, according to this order.
- Only to 1 region, HFile is allocated.
- The rows are stored in HFile, in sorted by KeyValues on disk.
- Moreover, the entire sorted set is written to a new HFile in HDFS, while the MemStore accumulates data more than its limit.
- In each column family, HBase uses multiple HFiles, which may consist of actual cells or key-value instances.
- In each HFile, the highest sequence number stored as a meta field, to a better state where it has ended previously and where to continue next.
- To search the data without having to read the whole file, HFile contains a multi-layered index which allows HBase.
- HDFS replicates the WAL and HFile blocks.
- Also, replication OF HFile block happens automatically.
- By default, IO in HBase happens at HFile block level which is 64KB.
Moreover, HRegion Server controls integrating HFile component to have HRegion.
ii. META Table
META Table is one of the major components of HBase Operations.
HBase Read operation needs to know which HRegion server has to be accessed for reading actual data, so, we use META Table in Read operation of HBase.
Moreover, the META Table will have the updated data because, after every Write process, this table is updated for the next Read.
- An HBase table which keeps a list of all regions in the system is META Table.
- It is like a binary tree.
- Its structure is as follows:
Key: Region start key, Region id
HBase Write Path
These following steps occur in HBase Operations, while the client gives a command to Write:
- At very first, for the fault tolerant purpose, write important logs to Write Ahead Log. Hence, HBase always has WAL to look into, if any error occurs while writing data.
- The data to be written is forwarded to MemStore which is actually the RAM of the data node, as soon as the log entry is done. All the data is written in MemStore which is faster than RDBMS (Relational databases).
- Afterward, all the data is dumped in HFile, however, the actual data is stored in HDFS. Also, then data stores in HFile directly, if the MemCache is full.
- Further, ACK (Acknowledgement) is sent to the client as a confirmation of task completed, as soon as writing data is completed.
HBase Read Path
As a client sends a request to HBase, read process starts. A request is sent to zookeeper which keeps all the status of the distributed system, where HBase is also present.
- META Table which is present in HRegion Server, Zookeeper has the location for it. Hence, Zookeeper gives the address for the table, at the time a client requests.
- Afterward, that process continues to META Table after HRegionServer. So, there it gets the region address of table where the data is present to be read.
- Further, the process enters the BlockCache where data is present from the previous read. However, the client will get the same data in no time, if a user queries the same records. Also, the process returns to the client with the data as result, if the table is found.
- Moreover, data would have been written to HFile sometime back, the process starts to search MemStore, if the table is not found. Then, the process returns to the client with the data as result, if it is found.
- Furthermore, the process moves forward in search of data within the HFile, if the table is not found. Once the search is completed, the data will be located here, the process takes required data and moves forward.
- Now, make sure, The data which HFile takes is the latest read data and further, it can be read by the user again. The reason that the data is written in BlockCache, is it can be instantly accessed by the client, at the next time.
- Finally, the read process with required data will be returned to the client along with ACK, while the data is written in BlockCache and all the search is completed.
So, this was all about HBase Operations. Hope you like our explanation.
Hence, in this HBase Operations tutorial, we have seen how HBase performs Read and Write operations internally. Moreover, we also discussed 2 major components of HBase, these are HFile and META Table in operation of HBase. However, if any doubt occurs, feel free to ask in the comment tab.