Optimizing Your Database

The transactional database engine provides several features that allow you to conserve disk space and improve system performance. These features include the following:

•

Duplicatable Keys

•

•

•

•

•

Variable-tail Allocation Tables

•

Key-Only Files

Duplicatable Keys

If you define a key to be duplicatable, the transactional database engine allows more than one record to have the same value for that key. Otherwise, each record must have a unique value for the key. If one segment of a segmented key is duplicatable, all the segments must be duplicatable.

Linked-Duplicatable Keys

By default in a v7.0 or later file, the transactional database engine stores duplicatable keys as linked-duplicatable keys. When the first record with a duplicate key value is inserted into a file, the transactional database engine stores the key value on an index page. The transactional database engine also initializes two pointers to identify the first and last record with this key value. Additionally, the transactional database engine stores a pair of pointers at the end of the record on the data page. These pointers identify the previous and next record with the same key value. When you create a file, you can reserve pointers for use in creating linked-duplicatable keys in the future.

If you anticipate adding duplicatable keys after you create a data file and you want the keys to use the linked-duplicatable method, you can preallocate space for pointers in the file.

Repeating-Duplicatable Keys

If no room is available to create a linked-duplicatable key (that is, if no duplicate pointers are available), the transactional database engine creates a repeating-duplicatable key. The transactional database engine stores every key value of a repeating-duplicatable key both on a data page and on an index page. In other words, the key’s value resides in the record on a data page and is repeated in the key entry on an index page.

Note: For users of pre-6.0 Btrieve, the term linked-duplicatable corresponds to permanent, and the term repeating-duplicatable corresponds to supplemental.

You can define a key as a repeating-duplicatable key by setting bit 7 (0x80) of the key flags in the key’s specification block on a Create (14) or Create Index (31) operation. Prior to 6.10, you could not define a key to be a repeating-duplicatable key, and bit 7 of the key flags was not user-definable. In 6.0 and later, the Stat operation (15) sets bit 7 if no room is available to create a linked-duplicatable key (and therefore, if the transactional database engine has to create the key as a repeating-duplicatable key).

Files that use the 5.x format use this same key flag (called the supplemental key attribute in 5.x) to designate that a key was created with the 5.x Create Supplemental Index operation (31).

Note: In pre-6.0 files, you can drop supplemental indexes only. Permanent indexes, as their name implies, cannot be dropped. In 6.0 and later files, you can drop any index.

Linked vs. Repeating

Each method has performance advantages:

•

Linked-duplicatable keys provide faster lookup, because the transactional database engine reads fewer pages.

•

Repeating-duplicatable keys result in fewer index page conflicts when multiple users access the same page concurrently.

There are trade-offs in performance between linked-duplicate keys and repeating-duplicate keys. Generally, if the average number of duplicates of a key is two or more, a linked-duplicate key will take up less space on the disk and searches will be generally faster because there are fewer index pages. However, the opposite is true if your file stores very few records with duplicate keys and the key length is very short. This is because an entry in the linked-duplicatable tree requires 8 bytes for pointers, whereas a repeating-duplicatable key entry requires 4.

If only a small percentage of keys have any duplicates, it is advantageous to use repeating-duplicate keys to save the extra 8 bytes in each data record. There is no noticeable performance advantage to either choice when the average number of duplicates of a key is less than two.

If you expect several concurrent transactions to be active on the same file at the same time, repeating-duplicate keys will provide a greater probability that these transactions do not try to access the same pages. All pages involved in writes during a concurrent transaction have implicit locks on them. When a page is required in order to make a change during a concurrent transaction and the page is involved in another concurrent transaction, the transactional database engine might wait until the other transaction is complete. If these kind of implicit waits happen very often, the performance of the application goes down.

Neither key storage method is recommended as a shortcut for tracking chronological order. For linked-duplicatable keys, the transactional database engine does maintain the chronological order of records that are inserted after the key is created, but if you rebuild the key’s index, the chronological order is lost. For repeating-duplicatable keys, the transactional database engine maintains the chronological order of the records only if there was no record deletion between the time the key is built and a new record is inserted. To track chronological order, use an AUTOINCREMENT data type on the key.

Page Preallocation

Preallocation guarantees that disk space is available when the transactional database engine needs it. The transactional database engine allows you to preallocate up to 65,535 pages to a file when you create a data file. Table 26 shows the maximum number of bytes the transactional database engine allocates for a file of each possible page size (assuming you allocate 65,535 full pages of disk space):

Table 26

Disk Space Allocation based on Page Size

Page Size

Disk Space Allocated1

1Value is page size multiplied by 65,535

If not enough space exists on the disk to preallocate the number of pages you specify, the transactional database engine returns Status Code 18 (Disk Full) and does not create the file.

The speed of file operations can be enhanced if a data file occupies a contiguous area on the disk. The increase in speed is most noticeable on very large files. To preallocate contiguous disk space for a file, the device on which you are creating the file must have the required number of bytes of contiguous free space available. The transactional database engine preallocates the number of pages you specify, whether or not the space on the disk is contiguous.

Use the formulas described earlier in this chapter to determine the number of data and index pages the file requires. Round any remainder from this part of the calculation to the next highest whole number.

When you preallocate pages for a file, that file actually occupies that area of the disk. No other data file can use the preallocated area of disk until you delete or replace that file.

As you insert records, the transactional database engine uses the preallocated space for data and indexes. When all the preallocated space for the file is in use, the transactional database engine expands the file as new records are inserted.

When you issue an API Stat operation (15), the transactional database engine returns the difference between the number of pages you allocated at the time you created the file and the number of pages that the transactional database engine currently has in use. This number is always less than the number of pages you specify for preallocation because the transactional database engine considers a certain number of pages to be in use when a file is created, even if you have not inserted any records.

Once a file page is in use, it remains in use even if you delete all the records stored on that page. The number of unused pages that the Stat operation returns never increases. When you delete a record, the transactional database engine maintains a list of free space in the file and reuses the available space when you insert new records.

Even if the number of unused pages that the Stat operation returns is zero, the file might still have free space available. The number of unused pages can be zero if one of the following is true:

•

You did not preallocate any pages to the file.

•

All the pages that you preallocated were in use at one time or another.

Blank Truncation

If you choose to truncate blanks, the transactional database engine does not store any trailing blanks in the variable-length portion of the record when it writes the record to the file. Blank truncation has no effect on the fixed-length portion of a record. The transactional database engine does not remove blanks that are embedded in the data.

When you read a record that contains truncated trailing blanks, the transactional database engine expands the record to its original length. The value the transactional database engine returns in the Data Buffer Length parameter includes any expanded blanks. Blank truncation adds either 2 bytes or 4 bytes of overhead to the physical size of the record (stored with the fixed-record portion): 2 if the file does not use VATs, 4 if it does.

Record Compression

When you create a file, you can specify whether you want the transactional database engine to compress the data records when it stores them in the file. Record compression can result in a significant reduction of the space needed to store records that contain many repeating characters. The transactional database engine compresses five or more of the same contiguous characters into 5 bytes.

Consider using record compression in the following circumstances:

•

The records to be compressed are structured so that the benefits of using compression are maximized.

•

The need for better disk utilization outweighs the possible increased processing and disk access times required for compressed files.

•

The computer running the transactional database engine can supply the extra memory that the transactional database engine uses for compression buffers.

Note: The database engine automatically uses record compression on files that use system data and have a record length that exceeds the maximum length allowed. See Table 12.

When you perform record I/O on compressed files, the transactional database engine uses a compression buffer to provide a block of memory for the record compression and expansion process. To ensure sufficient memory to compress or expand a record, the transactional database engine requires enough buffer space to store twice the length of the longest record your task inserts into the compressed file. This requirement can have an impact on the amount of free memory remaining in the computer after the transactional database engine loads. For example, if the longest record your task writes or retrieves is 64 KB long, the transactional database engine requires 128 KB of memory to compress and expand that record.

Note: If your file uses VATs, the transactional database engine’s requirement for buffer space is the product of 16 times the file’s page size. For example, on a 4 KB record, 64 KB of memory are required to compress and expand the record.

Because the final length of a compressed record cannot be determined until the record is written to the file, the transactional database engine always creates a compressed file as a variable-length record file. In the data page, the transactional database engine stores either 7 bytes (if the file does not use VATs) or 11 bytes (if it does), plus an additional 8 bytes for each duplicate key pointer. The transactional database engine then stores the record on the variable page. Because the compressed images of the records are stored as variable-length records, individual records may become fragmented across several file pages if your task performs frequent insertions, updates, and deletions. The fragmentation can result in slower access times because the transactional database engine may need to read multiple file pages to retrieve a single record.

The record compression option is most effective when each record has the potential for containing a large number of repeating characters. For example, a record may contain several fields, all of which may be initialized to blanks by your task when it inserts the record into the file. Compression is more efficient if these fields are grouped together in the record, rather than being separated by fields containing other values.

To use record compression, the file must have been created with the compression flag set. Key-only files do not allow compression.

Index Balancing

The transactional database engine allows you to further conserve disk space by employing index balancing. By default, the transactional database engine does not use index balancing, so that each time a current index page is filled, the transactional database engine must create a new index page. When index balancing is active, the transactional database engine can frequently avoid creating a new index page each time a current index page is filled. Index balancing forces the transactional database engine to look for available space on adjoining index pages. If space is available on one of those pages, the transactional database engine moves keys from the full index page to the page with free space.

The balancing process not only results in fewer index pages but also produces more densely populated indexes, better overall disk usage, and faster response on most read operations. If you add keys to the file in sorted order, index page usage increases from 50 percent to nearly 100 percent when you use index balancing. If you add keys randomly, the minimum index page usage increases from 50 percent to 66 percent.

On insert and update operations, the balancing logic requires the transactional database engine to examine more pages in the file and might possibly require more disk I/O. The extra disk I/O slows down file updates. Although the exact effects of balancing indexes vary in different situations, using index balancing typically degrades performance on write operations by about 5 to 10 percent.

The transactional database engine allows you to fine-tune your transactional database engine environment by offering two ways to turn on index balancing: at the engine level or at the file level. If you specify the Index Balancing configuration option during setup, the transactional database engine applies index balancing to every file. For a description of how to specify the Index Balancing configuration option, refer to the Pervasive PSQL User's Guide.

You can also designate that only specific files are to be index balanced. To do so, set bit 5 (0x20) of the file’s file flags at creation time. If the index Balancing configuration option is off when the transactional database engine is started, the transactional database engine applies index balancing only to indexes on files that have bit 5 of the file flags set.

The transactional database engine ignores bit 5 in all files’ file flags if the index balancing configuration option was on when the transactional database engine was started. In this situation, the transactional database engine applies index balancing to every file.

Files remain compatible regardless of whether index balancing is active. Also, you do not have to specify index balancing to access files that contain balanced index pages. If you turn on the transactional database engine’s index balancing option, index pages in existing files are not affected until they become full. The transactional database engine does not re-balance indexes on existing files as a result of enabling this option. Similarly, turning off the index balancing option does not affect existing indexes. Whether this option is on or off determines how the transactional database engine handles full index pages.

Variable-tail Allocation Tables

Variable-tail allocation tables give the transactional database engine faster access to data residing at large offsets in very large records and significantly reduce the buffer sizes the transactional database engine needs when processing records in files that use data compression. Using a record’s VAT, the transactional database engine can divide the variable-length portion of a record into smaller portions and then store those portions in any number of variable tails. The transactional database engine stores an equal amount of the record’s data in each of the record’s variable tails (except the final variable tail). The transactional database engine calculates the amount to store in each variable tail by multiplying the file’s page size by 8. The last variable tail contains any data that remains after the transactional database engine divides the data among the other variable tails.

Note: The formula for finding the length of a variable tail (eight times the page size) is an implementation detail that may change in future versions of the transactional database engine.

In a file that employs VATs and has a 4,096-byte page size, the first variable tail stores the bytes from offset 0 through 32,767 of the record’s variable portion, the second tail stores offsets 32,768 through 65,535, and so on. The transactional database engine can use the VAT to accelerate a seek to a large offset in a record because the VAT allows it to skip the variable tails containing the record’s lower-offset bytes.

An application specifies whether a file will use VATs when it creates the file. If your application uses chunk operations on huge records (larger than eight times the physical page size) and accesses chunks in a random, nonsequential fashion, VATs may improve your application’s performance. If your application uses whole record operations, VATs do not improve performance because the transactional database engine reads or writes the record sequentially; the transactional database engine skips no bytes in the record.

If your application uses chunk operations but accesses records sequentially (for example, it reads the first 32 KB of a record, then reads the next 32 KB of the record, and so on until the end of the record), VATs do not improve performance, because the transactional database engine saves your position in a record between operations, thereby eliminating the need to seek at the beginning of a chunk operation.