HBase Operations: Read and Write Operations

Boost your career with Free Big Data Courses!!

Today, in this HBase article “HBase Operations: Read and Write” we will learn the whole concept of HBase. There are two basic Operations of HBase i.e. HBase read and HBase write.

Moreover, in this HBase tutorial, we will see some major components of HBase Operations such as HFile, META table.

So let’s start HBase Operations.

HBase Operations: Read and Write

Basically, in both data read and write operation of HBase, there are two major components which play a vital role in it, like HFile and META Table, so let’s study about both in detail:

i. HFile

A basic level HBase architecture where the tables exist in physical form is what we call HFile.
Some key points in HFile:

  • A primary identifier is a Row key.
  • Here in lexicographical order, keys are stored.
  • Data is stored and split across the nodes, according to this order.
  • Only to 1 region, HFile is allocated.
  • The rows are stored in HFile, in sorted by KeyValues on disk.
  • Moreover, the entire sorted set is written to a new HFile in HDFS, while the MemStore accumulates data more than its limit.
  • In each column family, HBase uses multiple HFiles, which may consist of actual cells or key-value instances.
  • In each HFile, the highest sequence number stored as a meta field, to a better state where it has ended previously and where to continue next.
  • To search the data without having to read the whole file, HFile contains a multi-layered index which allows HBase.
  • HDFS replicates the WAL and HFile blocks.
  • Also, replication OF HFile block happens automatically.
  • By default, IO in HBase happens at HFile block level which is 64KB.

Moreover, HRegion Server controls integrating HFile component to have HRegion.

ii. META Table

META Table is one of the major components of HBase Operations.

HBase Read operation needs to know which HRegion server has to be accessed for reading actual data, so, we use META Table in Read operation of HBase.

Moreover, the META Table will have the updated data because, after every Write process, this table is updated for the next Read.

  • An HBase table which keeps a list of all regions in the system is META Table.
  • It is like a binary tree.
  • Its structure is as follows:

Key: Region start key, Region id
Values: RegionServer

HBase Write Path

These following steps occur in HBase Operations, while the client gives a command to Write:

  • At very first, for the fault tolerant purpose, write important logs to Write Ahead Log. Hence, HBase always has WAL to look into, if any error occurs while writing data.
  • The data to be written is forwarded to MemStore which is actually the RAM of the data node, as soon as the log entry is done. All the data is written in MemStore which is faster than RDBMS (Relational databases).
  • Afterward, all the data is dumped in HFile, however, the actual data is stored in HDFS. Also, then data stores in HFile directly, if the MemCache is full.
  • Further, ACK (Acknowledgement) is sent to the client as a confirmation of task completed, as soon as writing data is completed.

HBase Read Path

As a client sends a request to HBase, read process starts. A request is sent to zookeeper which keeps all the status of the distributed system, where HBase is also present. 

  • META Table which is present in HRegion Server, Zookeeper has the location for it. Hence, Zookeeper gives the address for the table, at the time a client requests.
  • Afterward, that process continues to META Table after HRegionServer. So, there it gets the region address of table where the data is present to be read.
  • Further, the process enters the BlockCache where data is present from the previous read. However, the client will get the same data in no time, if a user queries the same records. Also, the process returns to the client with the data as result, if the table is found.
  • Moreover, data would have been written to HFile sometime back, the process starts to search MemStore, if the table is not found. Then, the process returns to the client with the data as result, if it is found.
  • Furthermore, the process moves forward in search of data within the HFile, if the table is not found. Once the search is completed, the data will be located here, the process takes required data and moves forward.
  • Now, make sure, The data which HFile takes is the latest read data and further, it can be read by the user again. The reason that the data is written in BlockCache, is it can be instantly accessed by the client, at the next time.
  • Finally, the read process with required data will be returned to the client along with ACK, while the data is written in BlockCache and all the search is completed.

So, this was all about HBase Operations. Hope you like our explanation.

Conclusion

Hence, in this HBase Operations tutorial, we have seen how HBase performs Read and Write operations internally. Moreover, we also discussed 2 major components of HBase, these are HFile and META Table in operation of HBase. However, if any doubt occurs, feel free to ask in the comment tab.

Did you know we work 24x7 to provide you best tutorials
Please encourage us - write a review on Google

follow dataflair on YouTube

3 Responses

  1. Akshay verma says:

    Nice and an informative article on HBase Operations.

  2. ASHOK KUMAWAT says:

    Awesome content, able to understand well how the HBASE read and write process actually works.

Leave a Reply

Your email address will not be published. Required fields are marked *