Data Compression

Share this page

Data Compression

Columnar storage inherently makes compression (and decompression) more efficient than does row-oriented storage.

For row-oriented data, choosing a compression method that works well for the variety of data types in a row can be challenging, because compression for text and numeric data work best with different algorithms.

Column storage allows the algorithm to be chosen according to the data type and the data domain and range, even where the domain and range are not declared explicitly. For example, an alphabetic column GENDER defined as CHAR(1) will have only two actual values (M and F), and rather than storing an eight-bit byte, the value can be compressed to a single bit, and then the bit string can be further compressed.

Vector uses different types of algorithms from those found in most other products. Because Vector processes data so efficiently, compression—and in particular decompression—are designed to use little CPU and to reduce disk I/O. While on-disk compression ratios may be slightly lower than other products, overall performance is improved.

Compression in Vector is automatic, requiring no user intervention. Vector chooses a compression method for each column, per data block, according to its data type and distribution.