Master 6 Data Compression Techniques in SAP HANA with Examples

Placement-ready Online Courses: Your Passport to Excellence - Start Now

Previously, we learned replication modes in SAP HANA, now, let’s move on to the data compression techniques used in column store in SAP HANA.

Here, you will find information about the basics of data compression in SAP HANA plus the different compression techniques used ranging from basic to advanced level techniques.

Let’s quickly start by introducing the concept.

Data Compression in SAP HANA

Data compression techniques compress the data in column stores in the HANA database. It is a necessary step before storing data in HANA database so that SAP HANA’s performance optimizes.

Data compression enables performance optimization in terms of decreasing operational costs by keeping data efficiently in the main memory, speeding up searches and calculations.

Data compression techniques primarily store data and avoid data redundancy. The compression techniques are automatically optimized and calculated during the delta merge operation. If an empty column is stored in the HANA memory, compression is not applied at that time.

Compression is recalculated and applied along with delta merge operation which is executed whenever data is inserted in that column. To activate the automatic compression, the value of parameter “active” in the optimize_compression section must be YES.

Compression Factor

For applying any type of compression on a data table, we must calculate a compression value first using appropriate factors. Thus, the compression value is calculated using a compression factor.

The compression factor is the ratio of the size of uncompressed data to the size of compressed data in SAP HANA.

Technology is evolving rapidly!
Stay updated with DataFlair on WhatsApp!!

We determine the size of uncompressed data as a product of nominal record size and number of records in a table. Whereas, the size of compressed data is the total size of a table residing in the main memory of SAP HANA.

Run the SQL command given below to check the compression properties and apply compression on a loaded table.

LOAD <table_name>
SELECT SCHEMA_NAME, TABLE_NAME, COLUMN_NAME, COMPRESSION_TYPE, LOADED from
PUBLIC.M_CS_COLUMNS where SCHEMA_NAME = '<your_schema>' and TABLE_NAME = '<your_table>

Compression Techniques in SAP HANA

In this section, we’ll learn about different compression techniques for compressing data in a column store. Typically, data in column stores can undergo a two-fold compression. The first level of compression is a basic type which is Dictionary compression.

Further, on the second level, there are advanced compression methods that we apply to the data compressed by dictionary compression. The methods under advanced compression type are prefix encoding, run-length encoding, cluster encoding, indirect encoding, and sparse encoding.

1. Dictionary Compression

Dictionary compression is a standard compression method to reduce data volume in the main memory. It is a default compression method which compulsorily applies on all columns of a data table in HANA database. In this technique, we map distinct column values to consecutive numbers (value ID).

This optimizes SAP HANA performance as processing numbers are more efficient than character values. We can apply advanced compression technique on data compressed bye dictionary compression method.
Dictionary compression Technique in SAP HANA

  • Method type: Default
  • Applies to: Main store and delta store
  • Applied on: Applied generally
2. Prefix Encoding Compression

In prefix encoding compression, we prefix a numerical value ID by the value representing its number of occurrences. This avoids data redundancy and saves a lot of in-memory space.

Suppose, in a data value array; the value 4 is repeating 8 times consecutively, then, we will add a prefix value 8 before the numeric value 4.

Prefix Encoding Compression

  • Method type: PREFIXED
  • Applied to: Main store
  • Applied on: Single predominant column value
3. Run-length Encoding Compression

In the run-length encoding compression method, only one of the repeating value IDs is stored along with its start position.

Suppose, a data value with value ID 5 appears 3 times consecutively starting from position 0, then, the compression will store only at one instance off value ID 5 and the start position as 0. Thus, it is run-length encoding because it stores the start positions as run-length values.

Run Length Encoding Compression Technique in SAP HANA

  • Method type: RLE
  • Applied to: Main storage
  • Applied on: Several frequent column values
4. Cluster Encoding Compression

The cluster encoding compression method cuts a long value ID array into small chunks of 1024 elements. In a piece of an array, if all the value IDs are the same and repeating, then we take only one instance of that value ID. It generates a bit vector value of the cluster encoded value.

Where i=1, if the cluster gets replace by a single value and i=0, if the cluster values are the same as they were in the original uncompressed array.

Cluster Encoding Compression

  • Method type: CLUSTERED
  • Applied to: Main storage
  • Applied on: Several frequent column values
5. Indirect Encoding Compression

We use indirect encoding even after cluster encoding. Only repeating values remain in the compressed array. Thus, in such cases, there are few distinct values and more repeating values which is further compressed into a cluster-specific dictionary unit representing the value IDs in even fewer bits.

This is how indirect encoding compression works where a value ID array compresses down to dictionary blocks.

Indirect Encoding Compression Technique in SAP HANA

  • Method type: INDIRECT
  • Applied to: Main storage
  • Applied on: Several frequent column values
6. Sparse Encoding Compression

In sparse encoding compression, we remove the value which repeats most often in a value ID array from the main array. Then bit vector value indicates the position of this value.

Sparse Encoding

  • Method type: SPARSE
  • Applied to: Main storage
  • Applied on: Single predominant column value and not appropriately clustered value ID array.

Summary

This concludes our tutorial of data compression in SAP HANA and different data compression techniques.

We hope the explanation was helpful and you understood every data compression technique. If you have any queries, drop them in the comment section below.

Your opinion matters
Please write your valuable feedback about DataFlair on Google

courses

DataFlair Team

The DataFlair Team provides industry-driven content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our expert educators focus on delivering value-packed, easy-to-follow resources for tech enthusiasts and professionals.

Leave a Reply

Your email address will not be published. Required fields are marked *