Rebuild Utility Concepts

The file format that results from using the command-line Rebuild depends on the -f parameter. If you omit the -f parameter, Rebuild uses the value set for the MicroKernel's Create File Version configuration option. For example, if the Create File Version value is 8.x, then running the Rebuild utility on version 7.x files converts them to 8.x format. See Create File Version and -f parameter.

•

Close all data files before running the backup utility.

•

Use continuous operations (only during the backup).

Note: You cannot run Rebuild on a file that is in continuous operation mode.

Temporary Files

On Windows, Rebuild creates temporary files in the directory specified by the TMP system environment variable. By default on Linux and OS X, Rebuild creates temporary files in the output directory (or in the source directory if the -b parameter is not used). Therefore, you need enough disk space in the temporary file directory (while the Rebuild utility is running) to potentially accommodate both the original file and the new file. You can specify a different directory for storing these files by using the Output Directory option in the Rebuild GUI version or by using the -b parameter with the CLI versions.

Normally, Rebuild deletes temporary files when the conversion is complete. However, if a power failure or other serious interruption occurs, Rebuild may not delete the temporary files. If this occurs, delete the following types of temporary files:


Platform	Temporary File Names
Linux, OS X	_rbldxxxxxx, where xxxxxx is six random letters. Caution: Be sure that you do not delete the Rebuild executable, rbldcli.
Windows	_rbldx, where x is a number.

Optimizing the Rebuild Process

Rebuild makes Btrieve calls to the database engine. Therefore, the database engine configuration settings and the amount of random access memory (RAM) in your computer affect the performance of the rebuild process. This is particularly evident in the amount of time required to rebuild large data files.

In general, building indexes requires much more time than building data pages. If you have a data file with many indexes, it requires more time to rebuild than would the same file with fewer indexes.

The following items can affect the rebuild processing time:

•

CPU Speed and Disk Speed

•

Amount of Memory

•

Sort Buffer Size

•

Max MicroKernel Memory Usage

•

Cache Allocation Size

•

Index Page Size

•

Number of Indexes

CPU Speed and Disk Speed

The speed of the central processing unit (CPU) and access speed of the physical storage disk can affect processing time during a rebuild. In general, the faster the speed for both of these, the faster the rebuild process. Disk speed is more critical for rebuilding files that are too large to fit entirely in memory.

Tip: Large files, such as 3 or 4 GB or more, may take several hours to convert. If you have more than one database engine available, you may wish to share the rebuild processing among a number of machine CPUs. For example, you could copy some of your files to each machine that has a database engine installed, then copy the files back after the rebuild process.

Amount of Memory

Rebuild is capable of rebuilding a file using two different methods, a default method and an alternative method. See -m<0 | 2> parameter. The method chosen depends on the amount of memory available. For the default method (-m2), Rebuild takes the following steps provided available memory exists.

Creates a new, empty data file with the same record structure and indexes as defined in the source file.

Drops all the indexes from the new file.

Copies all the data into the new file, without indexes.

Adds the indexes, using the following process.

For a particular key in the source file, reads as many key values as possible into a memory buffer using the Extended Step operation.

Sorts the values in the memory buffer and writes the sorted values to a temporary file.

Repeats steps a and b, processing the key value from every record.

The temporary file now contains several key value sets, each of which has been individually sorted.

Merges the sets into index pages, filling each page to capacity. Each index page is added to the data file at the end, extending the file length.

Repeats steps 4 and 5 for each remaining key.

If any failure occurs during this process, such as a failure to open or write the temporary file, Rebuild starts over and uses the alternative method to build the file.

Rebuild uses an alternative method (-m0) when insufficient memory exists to use the default method, or if the default method encounters processing errors.

Creates a new, empty data file with the same record structure and indexes as defined in the source file.

Drops all the indexes from the new file.

Copies all the data into the new file, without indexes.

Adds the indexes, using the following process.

For a particular key in the source file, reads one record at a time using the Step Next operation.

Extracts the key value from the record and inserts it into the appropriate place in the index. This necessitates splitting key pages when they get full.

Repeats steps a and b, processing the key value from every record.

Repeats step 4 for each remaining key.

The alternative method is typically much slower than the default method. If you have large data files with many indexes, the difference between the two methods can amount to many hours or even days. The only way to ensure that Rebuild uses the default method is to have enough available memory. Several Configuration settings affect the amount of available memory.

Formulas For Estimating Memory Requirements

The following formulas estimate the optimal and minimum amount of contiguous free memory required to rebuild file indexes using the fast method. The optimal memory amount is enough memory to store all merge blocks in RAM. The minimum amount of memory is enough to store one merge block in RAM.

Key Length = total size of all segments of largest key in the file.

Key Overhead = 8 if key type is not linked duplicate. 12 if key type is linked duplicate.

Record Count = number of records in the file.

Optimal Memory Bytes = (((Key Length + Key Overhead) * Record Count) + 65536) / 0.6

Minimum Memory Bytes = Optimal Memory Bytes / 30

For example, if your file has 8 million records, and the longest key is 20 bytes (not linked duplicate), the preferred amount of memory is 373.5 MB, or ((( 20 + 8 ) * 8,000,000 ) + 65536 ) / 0.6 = 373,442,560 bytes.

The optimal amount of contiguous free memory is 373.5 MB. If you have at least this much free memory available, the Rebuild process takes place entirely in RAM. Because of the 60% allocation limit, the optimal amount of memory is actually the amount required to be free when the rebuild process starts, not the amount that the rebuild process actually uses. Multiply this optimal amount by 0.6 to determine the maximum amount Rebuild actually uses.

The minimum amount of memory is 1/30th of the optimal amount, 12,448,086 bytes, or 12.45 MB.

The divisor 30 is used because the database engine keeps track of no more than 30 merge blocks at once, but only one merge block is required to be in memory at any time. The divisor 0.6 is used because the engine allocates no more than 60% of available physical memory for rebuild processing.

If you do not have the minimum amount of memory available, Rebuild uses the alternative method to rebuild your data file.

Finally, the memory block allocated must meet two additional criteria: blocks required and allocated block size.

Blocks required must be less than or equal to 30, where:

Blocks Required = Round Up (Optimal Memory Bytes / Allocated Block)

Allocated block size must be greater than or equal to:

((2 * Max Keys + 1) * (Key Length + Key Overhead)) * Blocks Required

Assuming a 512-byte page size, and a block of 12.45 MB successfully allocated, the value for blocks required is:

Blocks Required = 373,500,000 / 12,450,000 = 30

The first criteria is met.

The value for allocated block size is:

Max Keys = (512-12) / 28 = 18

(((2 * 18) + 1) * (20 + 8)) * 9 = 9324

Is Allocated Block (12.5 million bytes) larger than 9324 bytes? Yes, so the second criteria is met. The index keys will be written to a temporary file in 12.45 MB pieces, sorted in memory, and then written to the index.

Sort Buffer Size

This setting specifies the maximum amount of memory that the MicroKernel dynamically allocates and de-allocates for sorting purposes during run-time creation of indexes. See Sort Buffer Size.

If the setting is zero (the default), Rebuild calculates a value for optimal memory bytes and allocates memory based on that value. If the memory allocation succeeds, the size of the block allocated must be at least as large as the value defined for minimum memory bytes. See Formulas For Estimating Memory Requirements.

If the setting is a non-zero value, and the value is smaller than the calculated minimum memory bytes, Rebuild uses the value to allocate memory.

Finally, Rebuild compares the amount of memory that it should allocate with 60% of the amount that is actually available. It then attempts to allocate the smaller of the two. If the memory allocation fails, Rebuild keeps attempting to allocate 80% of the last attempted amount. If the memory allocation fails completely (which means the amount of memory is less than the minimum memory bytes), Rebuild uses the alternative method to rebuild the file.

Max MicroKernel Memory Usage

This setting specifies the maximum proportion of total physical memory that the MicroKernel is allowed to consume. L1, L2, and all miscellaneous memory usage by the MicroKernel are included (Relational Engine is not included). See Max MicroKernel Memory Usage.

If you have large files to rebuild, temporarily set Max MicroKernel Memory Usage to a lower percentage than its default setting. Reset it to your preferred percentage after you complete your rebuilding.

Cache Allocation Size

This setting specifies the size of the Level 1 cache that the MicroKernel allocates; the MicroKernel uses this cache when accessing any data files. See Cache Allocation Size.

This setting determines how much memory is available to the database engine for accessing data files, not for use when indexes are built.

Increasing Cache Allocation to a high value does not help indexes build faster. In fact, it may slow the process by taking up crucial memory that is now unavailable to Rebuild. When rebuilding large files, decrease the cache value to a low value, such as 20% of your current value but not less than 5 MB. This leaves as much memory as possible available for index rebuilding.

Index Page Size

The page size in your file also affects the speed of index building. If Rebuild uses the alternative method, smaller key pages dramatically increase the time required to build indexes. Key page size has a lesser effect on building indexes if Rebuild uses the default method.

Rebuild can optimize page size for application performance or for disk storage.

To optimize for performance (your application accessing its data), Rebuild uses a default page size of 4096 bytes. This results in larger page sizes on physical storage and slower rebuilding times.

For a discussion of optimizing page size for disk storage, see Choosing a Page Size in PSQL Programmer's Guide in the Developer Reference.

Assume that your application has 8 million records, a 20-byte key, and uses a page size of 512 bytes. The MicroKernel places between 8 and 18 key values in each index page. This lessens the amount of physical storage required for each page. However, indexing 8 million records creates a B-tree about seven levels deep, with most of the key pages at the seventh level. Performance will be slower.

If you use a page size of 4096 bytes, the database engine places between 72 and 145 key values in each index page. This B-tree is only about four levels deep and requires many fewer pages to be examined when Rebuild inserts each new key value. Performance is increased but so is the requirement for the amount of physical storage.

Number of Indexes

The number of indexes also affects the speed of index building. Generally, the larger the number of indexes, the longer the rebuild process takes. The time required to build the indexes increases exponentially with increasing depth of the B-tree.

Log File

Information from a rebuild process is appended to a text log file. By default, the log file is placed in the current working directory.

For the CLI Rebuild, the default file name is rbldcli.log on Windows, Linux, and OS X. You may specify a location and name for the log file instead of using the defaults. See -lfile parameter.

You may examine the log file using a text editor. The information written to the log file includes the following:

•

Start time of the rebuild process

•

Parameters specified on the command line

•

Status code and error description (if an error occurs)

•

File being processed

•

Information about the processing (such as page size changes)

•

Total records processed

•

Total indexes rebuilt (if the -m2 processing method is used)

•

End time of the rebuild process

•

Status of the process (for example, if the file rebuilt successfully)