HBase MemStore – Uses, Benefits & Configuration
In this HBase article “HBase MemStore”, we will discuss the concept of one of the internal parts of HBase: the Memstore.
Apart from the meaning of HBase MemStore, we will also cover the uses of HBase Memstore along with the benefits of HBase Memstore. Also, we will learn Memstore configuration in HBase.
Stay updated with latest technology trends
Join DataFlair on Telegram!!
All the updates in memory as sorted KeyValues are stored in the MemStore. Basically, Data which contains sorted key/values is stored in an HFile. Moreover, per column family, there is one MemStore. Also, all the updates are sorted per column family.
In other words, all the in-memory modifications to the Store generally stores in a Memstore. Here, modifications are KeyValues.
Make sure we should not call the functions of HBase Memstore in parallel.
Some Key points or Memstore in HBase:
- In simple words, before a permanent write, a write buffer where HBase accumulates data in memory is what we call the MemStore.
- While the MemStore fills up, its contents flush to disk to form an HFile.
- It forms a new file on every flush, rather than writing to an existing HFile.
- Basically, for HBase, the HFile is the underlying storage format.
- Per column family, there is one MemStore. It is possible that one column family can have multiple HFiles, but not vice versa.
Following occurs, while the server hosting a MemStore that has not yet been flushed crashes:
- In order to record changes as they happen, every server in HBase cluster keeps a WAL. On defining a WAL, it is a file on the underlying file system. However, until the new WAL entry is successfully written, a write isn’t considered successful, this explains its durability.
- The data which was not yet flushed from the MemStore to the HFile can be recovered by replaying the WAL, if HBase goes down, that is taken care by Hbase framework.
Uses of HBase MemStore
Well, HBase users and/or administrators must know the meaning and uses of HBase MemStore, because:
- In order to gain better performance as well as to ignore issues, we can use MemStore in HBase. However, it is not possible to adjust settings in HBase on the basis of usage pattern.
- However, make sure constant flushes of HBase MemStore can affect reading performance in MemStore. Also, it can bring an additional load to the system.
- Moreover, the way in which MemStore flushes work it may affect our schema design.
Configuring MemStore Flushes
However, two types of groups are there of configuration properties in HBase MemStore:
- The first one determines at what time flush should trigger
- And, the second one also determines that at what time flush should be triggered but along with the updates which should be blocked while flushing.
Now, let’s learn about these groups in detail:
a. First Group
Basically, the “regular” flushes which happen in parallel with serving write requests, the first group triggers them.
However, for configuring flush thresholds, the properties are:
<property> <name>hbase.hregion.memstore.flush.size</name> <value>134217728</value> <description> Memstore will be flushed to disk if size of the memstore exceeds this number of bytes. Value is checked by a thread that runs every hbase.server.thread.wakefrequency. </description> </property>
<property> <name>hbase.regionserver.global.memstore.lowerLimit</name> <value>0.35</value> <description>Maximum size of all memstores in a region server before flushes are forced. Defaults to 35% of heap. This value equal to hbase.regionserver.global.memstore.upperLimit causes less possible flushing which occurs when due to memstore limiting, updates are blocked. </description> </property>
b. Second Group
Well we can say, mainly for safety reasons, the second group of settings is there, like- there are times when write load is so high, that even flushing cannot keep up with it hence, for that writes are blocked unless MemStore has “manageable” size as we don’t want memStore to grow without a limit.
So, with following, it is possible to configure thresholds:
<property> <name>hbase.regionserver.global.memstore.upperLimit</name> <value>0.4</value> <description>Maximum size of all memstores in a region server before new updates are blocked and flushes are forced. Defaults to 40% of heap. Updates are blocked and flushes are forced until size of all memstores in a region server hits hbase.regionserver.global.memstore.lowerLimit. </description> </property>
<property> <name>hbase.hregion.memstore.block.multiplier</name> <value>2</value> <description> Block updates if memstore has hbase.hregion.block.memstore time hbase.hregion.flush.size bytes. Useful preventing runaway memstore during spikes in update traffic. Without an upper-bound, memstore fills such that when it flushes the resultant flush files take a long time to compact or split, or worse, we OOME. </description> </property>
Compression & MemStore Flush
In order to compress the data stored on HDFS (i.e. HFiles), HBase is the right choice. Basically, this reduces the disk & network IO significantly. Also, it saves space occupied by data. However, when MemStore flushes or when data is written to HDFS, data generally compresses.
Thus, make sure process of compression in HBase MemStore should not slow down flushing process a lot, if it happens, may it occurs many of the problems above. Such as if MemStore is too big (hit upper limit) it’s resulting in blocking writes and such.
Benefits of HBase MemStore
Benefits of MemStore in HBase:
- MemStore keeps recently added data, that says it acts as an in-memory cache. Also, there are times, while more than older data, it access last written data, at that times it is very useful.
- And, also one of a great part is that every MemStore flush does create one HFile per Column Families.
So, this was all about HBase MemStore. Hope you like our explanation.
Hence, we have learned the concept of HBase MemStore. Moreover, we discussed uses and benefits of MemStore in HBase. Also, we saw the MemStore configurations. However, if any doubt occurs, regarding HBase MemStore, freely ask through comment tab.