11. Maintaining Storage Structures : Modify Procedures : Options to the Modify Procedure : Table Compression
 
Share this page                  
Table Compression
All storage structures--except R-tree secondary index and heapsort--permit tables and indexes (where present) to be compressed.
Compression is controlled using the Key and Data options in the Compression group box in the Structure of Table and Structure of Index dialogs. By default, there is no compression when creating or modifying.
Not all parts of all storage structures can be compressed, as summarized in the table below:
Storage Structure
Data
Key
B-tree
Base Table
Yes
Yes
 
Secondary Index
No
Yes
hash
Base Table
Yes
No
 
Secondary Index
Yes
No
heap
Base Table
Yes
No
 
Secondary Index
N/A
N/A
heapsort
Base Table
No
No
 
Secondary Index
N/A
N/A
ISAM
Base Table
Yes
No
 
Secondary Index
Yes
No
R-tree
Base Table
N/A
N/A
 
Secondary Index
No
No
Note:  In VDBA, selecting Data in the Compression group box in the Structure of Table dialog does not affect keys stored in ISAM or B-tree index and leaf pages--only the data on the data pages is compressed. To compress index entries on B-tree index pages, select Key instead.
ISAM index pages cannot be compressed.
Compression of tables compresses character and text columns. Integer, floating point, date, and money columns are not compressed, unless they are nullable and have a null value.
Trailing blanks and nulls are compressed in character and text columns. For instance, the emp table contains a comment column that is 478 bytes. However, most employees have comments that are only 20 to 30 bytes in length. This makes the emp table a good candidate for compression because 478 bytes can be compressed into 30 bytes or fewer, saving nearly 450 bytes per row.
Furthermore, as many rows are placed on each page as possible, so that the entire emp table (31 rows) that normally took eight 2KB pages as a heap, takes just one page as a compressed heap. In this example, pages were limited to four rows per page, but by using compression, many more rows can be held per page.
There is no formula for estimating the number of rows per page in a compressed table, because it is entirely data dependent.
When to Compress a Table
When a table is compressed, you can reduce the amount of disk I/O needed to bring a set of rows from disk. This can increase performance if disk I/O is a query-processing bottleneck.
For instance, having compressed the emp table from eight pages down to one page, the following query performs only one disk I/O, whereas prior to compression as many as eight disk I/Os were required:
select * from emp;
In a large table, compression can dramatically reduce the number of disk I/Os performed to scan the table, and thus dramatically improve performance on scans of the entire table. Compression is also useful for conserving the amount of disk space it takes to store a table.
Compression Overhead
Compression must be used wisely, because the overhead associated with it can sometimes exceed the gains.
If a machine has a fast CPU, disk I/O can be the bottleneck for queries. However, because compression incurs CPU overhead, the benefits must be weighed against the costs, especially for machines with smaller CPUs. Compression can increase CPU usage for a query because data must be decompressed before it is returned to the user. This increase must be weighed against the benefits of decreased disk I/O and how heavily loaded the CPU is. High compression further reduces disk I/O, but uses even more CPU resources.
There is overhead when updating compressed tables. As rows are compressed to fit as many as possible per page, if you update a row so that it is now larger than it was before, it must be moved to a new spot on the page or even to a new page. If a row moves, its tid, or tuple identifier, also changes, requiring that every secondary index on the compressed table also be updated to reflect the new tid. For more information, see Tids (see page Tids).
For example, if you change Shigio’s comment from “Good” to “Excellent,” Shigio’s record length grows from 4 bytes to 9 bytes and does not fit back in exactly the same place. His record needs to be moved to a new place (or page), with updates made to any secondary indexes of this table (if the emp table was B-tree, the appropriate B-tree leaf page is updated instead).
Compressed tables must be avoided when updates that increase the size of text or character columns occur frequently, especially if there are secondary indexes involved--unless you are prepared to incur this overhead. If you do compress and are planning to update, use a fill factor lower than 100% (75% for hash); the default fill factor for compressed tables is 75% for hash with data compression, 100% for the others. With free space on each page, moved rows are less likely to be placed on overflow pages. For more information, see Options to the Modify Procedure (see page Options to the Modify Procedure).