Options to the Modify Procedure
The modify procedure provides several options:
• Min Pages
• Max Pages
• Allocation
• Extend
• Fillfactor
• Leaffill
• Nonleaffill
• Unique
• Compression
The MinPages, MaxPages, Allocation, Fillfactor, Leaffill, and Nonleaffill options take effect during the modify procedure only, but are remembered in the system catalog description of the table. They will be applied again by a future modify-to-reconstruct, and will be output as part of the table description by copydb and unloaddb. The Extend, Unique, and Compression options are continuously active throughout the life of the table.
In VDBA, these options are in the Structure of Table and Structure of Index dialogs.
Number of Pages
Min Pages and Max Pages are valid options only when you are modifying the table to hash. These options allow you to control the hashing algorithm to some extent, extending the control offered by the Fillfactor option.
The Min Pages option is useful if the table will be growing rapidly or if you want few rows per page to increase concurrency so multiple people can update the same table.
You can achieve nearly the same effect by specifying a low value for the Fillfactor option, but the fill factor is based on the current size of the table, as described in
Alternate Fill Factors.
To force a specific number of main pages, use the Min Pages option to specify a minimum number of main pages. The number of main pages used are at least as many as specified, although the exact number of Min Pages specified is not used.
Example: Modify Structure and Force a Higher Number of Main Pages for a Table
For example, for the emp table in the previous chapter you can force a higher number of main pages by specifying the minimum number of main pages when you modify the table to hash. If you specify 30 main pages for the table, which has 31 rows, you have approximately one row per page.
To modify the storage structure of the emp table:
modify emp to hash on age with minpages = 30;
In VDBA, open the Structure of Table dialog for the table. Select Hash from the Structure drop-down list. Enter 30 in the Min Pages edit control. Enable the age column in the Columns list.
Example: Specify a Maximum Number of Main Pages for a Table
To specify a maximum number of main pages to use, rather than the system choice, use the Max Pages option. If the number of rows does not completely fit on the number of pages specified, overflow pages are allocated. If fewer pages are needed, the lesser number is used. Max Pages is useful mainly for shrinking compressed hash tables more than otherwise happens.
You can achieve nearly the same effect by specifying a high value for the Fillfactor option, but the fill factor is based on the current size of the table, as described in
Alternate Fill Factors.
The following example modifies the emp table, specifying a Max Pages value.
modify bigtable to hash on empno with maxpages = 100;
In VDBA, open the Structure of Table dialog for the table. Select Hash from the Structure drop-down list. Enter 100 in the Max Pages edit control. Enable the empno column in the Columns list.
Remember that Max Pages controls only the number of main pages; it does not affect overflow pages. For example, assume your data takes 100 pages in heap. If you modify the table to hash and limit the number of main pages to 50, the remainder of the data goes onto overflow pages.
Allocation of Space
Use the Allocation option to pre-allocate space. You can modify the table to an allocation greater than its current size to leave free space in the table. (The default is four pages if no allocation has been specified.)
Doing this allows you to avoid a failure due to lack of disk space, or to provide enough space for table expansion instead of having to perform a table extend operation. For more information, see
Extending a Table or Index.
The allocated size must be in the range 4 to 8,388,607 (the maximum number of pages in a table). The specified size is rounded up, if necessary, to make sure the allocation size for a multi-location table or index is always a multiple of sixteen.
Note: If the specified number of pages cannot be allocated, the modify procedure is aborted.
After an allocation is specified, it remains in effect and does not need to be specified again when the table or index is modified.
Example: Allocate 1000 Pages to a Table
The following example specifies that 1000 pages be allocated to table inventory:
modify inventory to btree with allocation = 10000;
In VDBA:
1. Open the Structure of Table dialog for the table.
2. Select B-tree from the Structure drop-down list.
3. Enter 1000 in the Allocation edit control.
The space allocated is 1008, due to rounding.
Extension of Space
The Extend option allows you to control the amount of space by which a table is extended when more space is required. (The default extension size is 16 pages.)
The size must be in the range 1 to max_size, where the max_size is calculated as:
8,388,607 – allocation_size.
The specified Extend size is rounded up, if necessary, to make sure the size for a multi-location table or index is always a multiple of sixteen.
Note: If the specified number of pages cannot be allocated, the operation fails with an error.
After an extend size has been specified for the table or index, it remains in effect and does not need to be specified again when the table or index is modified.
Example: Extend a Table in Blocks of 1000 Pages
The following example specifies that the table inventory be extended in blocks of 1000 pages:
modify inventory to btree with extend = 1000;
In VDBA:
1. Open the Structure of Table dialog for the table.
2. Select B-tree from the Structure drop-down list.
3. Enter 1000 in the Extend edit control.
The extension space is 1008, due to rounding.
Guidelines for Choosing an Extend Size
When choosing an extend size, keep the following in mind:
• When extending a table, not only the physical extension must be performed, but the extension must also be recorded. Therefore, avoid an excessively small extend size that requires many additional small extensions.
• In an environment that is short of disk space, a large extend size can cause an operation to fail, even when there is sufficient disk space for the particular operation.
• Windows: On a file system that requires the underlying files to be written to when allocating disk space, a large extend size can be undesirable because it affects the performance of the operation that causes the extension.
• UNIX: On a file system that requires the underlying files to be written to when allocating disk space, a large extend size can be undesirable because it affects the performance of the operation that causes the extension.
• VMS: On file systems that provide calls for allocating disk space, a large extend size helps reduce the amount of table fragmentation.
Default Fill Factors
Each storage structure has a different default fill factor. The term fill factor refers to the number of rows that are actually put on a data page divided by the number of rows that fit on a data page for a particular structure.
The various fill factors enable you to add data to the table without running into overflow problems. Because the data pages have room to add data, you do not have to remodify.
For instance, a heap table fits as many rows as possible on a page; this is known as 100% fill factor. However, ISAM and B-tree data pages are filled only to 80% capacity, leaving room to add 20% more data before a page is completely full.
The default data page fill factors are as follows:
The default B-tree index page fill factors are as follows:
The first table shows that if a heap table is 100 pages and you modify that table to hash, the table now takes up 200 pages, because each page is only 50% full.
Note: Depending on the system allocation for tracking used and free pages, the number of pages can be approximate. For more information, see the chapter “Calculating Disk Space.”
Alternate Fill Factors
You can tailor the fill factor for various situations. For instance, if the table is not going to grow at all, use a 100% fill factor for the table. On the other hand, if you know you are going to be adding a lot of data, you can use a low fill factor, perhaps 25%. Also, if your environment is one where updates are occurring all the time and good concurrency is important, you can set the fill factor low.
Note: Fill factor is used only at modify time. As you add data, the pages fill up and the fill factor no longer applies.
When specifying a fill factor other than the default, you must keep the following points in mind:
• Use a high fill factor when the table is static and you are not going to be appending many rows.
• Use a low fill factor when the table is going to be growing rapidly. Also, use a low fill factor to reduce locking contention and improve concurrency. A low fill factor distributes fewer keys per page, so that page level locks lock fewer records.
Specifying fill factor is useful for hash and ISAM tables. However, for B-tree tables, because data pages only are affected, the Fillfactor option must be used with the Leaffill or Nonleaffill options. See
Leaf Page Fill Factors and
Index Page Fill Factors.
For hash tables, typically a 50% fill factor is used for uncompressed tables. You can raise or lower this, but raising it too high can cause more overflow pages than desirable. You must always measure the overflow in a hash table when setting a high fill factor fill factors higher than 90% are likely to cause overflow.
If you are using compressed ISAM tables and are adding data, make sure you set the fill factor to something lower than the default 100%, or you immediately add overflow pages.
Normally, uncompressed ISAM tables are built with an 80% fill factor. You can set the fill factor on ISAM tables to 100%, and unless you have duplicate keys, you cannot have overflow problems until after you add data to the table.
In VDBA, you control the fill factor of the data pages using the Fillfactor option in the Structure of Table and Structure of Index dialogs.
Example: Set Fill Factor to 25% on a Hash Table
This example sets the fill factor on a hash table to 25%, rather than the default of 50%, by modifying the emp table:
modify emp to hash on empno with fillfactor = 25;
In VDBA:
1. Open the Structure of Table dialog for the table.
2. Select Hash from the Structure drop-down list.
3. Enter 25 in the Fillfactor edit control.
4. Enable the empno column in the Columns list.
Example: Set Fill Factor to 100% on an Uncompressed ISAM Table
This example sets the fill factor on an uncompressed ISAM table to 100%:
modify employee to isam on name with fillfactor = 100;
In VDBA:
1. Open the Structure of Table dialog for the table.
2. Select Isam from the Structure drop-down list.
3. Enter 100 in the Fillfactor edit control.
4. Enable the name column in the Columns list.
Leaf Page Fill Factors
It is possible to specify B-tree leaf page fill factors at modify time. This is the percentage of the leaf page that is used during the modify procedure. The remaining portion of the page is available for use later when new rows are added to the table.
The purpose of the fill factor is to leave extra room on the leaf pages to do inserts without causing leaf page splits. This is useful if you modify a table to B-tree and plan to add rows to it later.
In VDBA, you control these values using the Leaffill options in the Structure of Table dialog.
The Leaffill option specifies the percentage of each leaf page to be filled at the time the table is modified to B-tree or cB-tree. The Leaffill default is 70, which means that 70% of the leaf page is filled at modify time and 30% remains empty for future use.
For example, assume that the key-tid pair requires 400 bytes of storage. This means that five key-tid pairs fit on a single 2 KB B-tree leaf page. However, if the leaf page fill factor is specified at 60%, only three key-tid pairs are allocated on each B-tree leaf page at modify time. If subsequent updates to the table cause two new rows on this leaf page, they are placed in the empty space on the leaf page. The key-tid pairs are reordered on the leaf page from min to max. If more than two new rows need to be added to this leaf page, there is not enough space and the leaf page has to split.
Index Page Fill Factors
It is possible to specify B-tree index page fill factors at modify time. This is the percentage of the index page that is used during the modify procedure. The remaining portion of the page is available for use later when new rows are added to the table. The purpose of the fill factor is to leave extra room on the index pages to do inserts without causing index page splits. This is useful if you modify a table to B-tree and plan to add rows to it later.
In VDBA, you control these values using the Nonleaffill options in the Structure of Index dialog.
The Nonleaffill option specifies the percentage of each index page that is to be filled at the time the table is modified to B-tree. That is, it is like Leaffill, but for index pages instead of leaf pages. The Nonleaffill default is 80. This means that 80% of the index page is used at modify time and 20% remains empty for future use.
For example, assume that the key-tid pair requires 500 bytes of storage. This means that four key-tid pairs fit on a single B-tree index page. However, if the index page fill factor is specified at 75%, only three key-tid pairs are allocated on each 2 KB B-tree index page at modify time. If subsequent updates to the table cause another leaf page to be allocated, the empty space on the index page is used to hold a key-tid pair for that new leaf page. If there are enough new rows to cause two new leaf pages to be added to that index page, the index page must split. For more information, see
Tids.
Setting a fill factor of lower than 60 on leaf pages can help reduce locking contention when B-tree leaf pages are splitting, because index splitting is reduced. Setting Leaffill low for small but quickly growing B-trees is advisable.
When you specify a high Leaffill, index splitting is almost guaranteed to occur because leaf pages immediately fill up when data is added. Thus, you want to avoid a high fill factor unless the B-tree table is relatively static. Even in this case, use an ISAM table.
Ensuring Key Values Are Unique
Unique keys can be enforced automatically for hash, ISAM, and B-tree tables using the modify procedure.
Benefits of Unique Keys
Benefits of unique keys are:
• A good database design that provides unique keys enhances performance.
• You are automatically ensured that all data added to the table has unique keys.
• The Ingres optimizer recognizes tables that have unique keys and uses this information to plan queries wisely.
In most cases unique keys are an advantage in your data organization.
Disadvantages of Unique Keys
The disadvantages of unique keys include a small performance impact in maintaining uniqueness. You must also plan your table use so that you do not add two rows with the same key value.
Specify Unique Keys
In VDBA, unique keys can be specified as Row or Statement in the Unique group box in the Structure of Table and Structure of Index dialogs:
• Row indicates that uniqueness is checked as each row is inserted.
• Statement indicates that uniqueness is checked after the update statement is executed.
If you do not want to create a unique key, select the No option.
Example: Prevent the Addition of Two Names with the Same Number
The following example prevents the addition of two employees in the emp table with the same empno:
modify emp to isam unique on empno;
In VDBA:
1. Open the Structure of Table dialog for the emp table.
2. Select Isam from the Structure drop-down list.
3. Enable Row in the Unique radio button group box.
4. Enable the empno column in the Columns list.
If a new employee is added with the same employee number as an existing record in the table, the row is not added, and you are returned a row count of zero.
Note: An error is not returned in this case; only the row count shows that the row was not added. Be aware of this if you are writing application programs using unique keys.
Example: Modify a Table to Hash and Prevent the Addition of Two Names with the Same Number
The following example modifies the emp table to hash and prevents the addition of two employees in the emp table with the same empno.
modify emp to hash unique on empno;
In VDBA:
1. Open the Structure of Table dialog for the emp table.
2. Select Hash from the Structure drop-down list.
3. Enable Row in the Unique radio button group box.
4. Enable the empno column in the Columns list.
The rows in the following example have unique keys. Although employee #17 and #18 have the same records except for their employee numbers, the employee numbers are unique, so these are valid rows after the modification:
Empno Name Age Salary
| 17 | Shigio | 29| 28000.000|
| 18 | Shigio | 29| 28000.000|
| 1 | Aitken | 35| 50000.000|
The following two rows do not have unique keys. These two rows cannot both exist in the emp table after modification to hash unique on empno:
Empno Name Age Salary
| 17 | Shigio | 29| 28000.000|
| 17 | Aitken | 35| 50000.000|
Table Compression
All storage structures-except R-tree secondary index and heapsort-permit tables and indexes (where present) to be compressed.
Compression is controlled using the Key and Data options in the Compression group box in the Structure of Table and Structure of Index dialogs. By default, there is no compression when creating or modifying.
Not all parts of all storage structures can be compressed, as summarized in the table below:
Note: In VDBA, selecting Data in the Compression group box in the Structure of Table dialog does not affect keys stored in ISAM or B-tree index and leaf pages-only the data on the data pages is compressed. To compress index entries on B-tree index pages, select Key instead.
ISAM index pages cannot be compressed.
Compression of tables compresses character and text columns. Integer, floating point, date, and money columns are not compressed, unless they are nullable and have a null value.
Trailing blanks and nulls are compressed in character and text columns. For instance, the emp table contains a comment column that is 478 bytes. However, most employees have comments that are only 20 to 30 bytes in length. This makes the emp table a good candidate for compression because 478 bytes can be compressed into 30 bytes or fewer, saving nearly 450 bytes per row.
Furthermore, as many rows are placed on each page as possible, so that the entire emp table (31 rows) that normally took eight 2KB pages as a heap, takes just one page as a compressed heap. In this example, pages were limited to four rows per page, but by using compression, many more rows can be held per page.
There is no formula for estimating the number of rows per page in a compressed table, because it is entirely data dependent.
When to Compress a Table
When a table is compressed, you can reduce the amount of disk I/O needed to bring a set of rows from disk. This can increase performance if disk I/O is a query-processing bottleneck.
For instance, having compressed the emp table from eight pages down to one page, the following query performs only one disk I/O, whereas prior to compression as many as eight disk I/Os were required:
select * from emp;
In a large table, compression can dramatically reduce the number of disk I/Os performed to scan the table, and thus dramatically improve performance on scans of the entire table. Compression is also useful for conserving the amount of disk space it takes to store a table.
Compression Overhead
Compression must be used wisely, because the overhead associated with it can sometimes exceed the gains.
If a machine has a fast CPU, disk I/O can be the bottleneck for queries. However, because compression incurs CPU overhead, the benefits must be weighed against the costs, especially for machines with smaller CPUs. Compression can increase CPU usage for a query because data must be decompressed before it is returned to the user. This increase must be weighed against the benefits of decreased disk I/O and how heavily loaded the CPU is. High compression further reduces disk I/O, but uses even more CPU resources.
There is overhead when updating compressed tables. As rows are compressed to fit as many as possible per page, if you update a row so that it is now larger than it was before, it must be moved to a new spot on the page or even to a new page. If a row moves, its tid, or tuple identifier, also changes, requiring that every secondary index on the compressed table also be updated to reflect the new tid. For more information, see
Tids.
For example, if you change Shigio’s comment from “Good” to “Excellent,” Shigio’s record length grows from 4 bytes to 9 bytes and does not fit back in exactly the same place. His record needs to be moved to a new place (or page), with updates made to any secondary indexes of this table (if the emp table was B-tree, the appropriate B-tree leaf page is updated instead).
Compressed tables must be avoided when updates that increase the size of text or character columns occur frequently, especially if there are secondary indexes involved unless you are prepared to incur this overhead. If you do compress and are planning to update, use a fill factor lower than 100% (75% for hash); the default fill factor for compressed tables is 75% for hash with data compression, 100% for the others. With free space on each page, moved rows are less likely to be placed on overflow pages. For more information, see
Options to the Modify Procedure.
Page Size
The default page size is 8 KB. The corresponding buffer cache for the installation must also be configured with the page size you specify or you receive an error. For more information, see the “Configuring Ingres” chapter in the System Administrator Guide.
For more information on page size see
Table Pages.