User Guide : 15. Using the Query Optimizer : Database Statistics : When to Rerun Optimization
 
Share this page                  
When to Rerun Optimization
Optimization does not necessarily need to be run whenever data is changed or added to the database. Optimization collects statistics that represent percentages of data in ranges and repetition factors. For instance, the statistics collected on employee gender show that 49% of the employees are female and 51% are male. Unless this percentage shifts dramatically, there is no need to rerun optimization on this column, even if the total number of employees changes.
You must rerun optimization if there are modifications to the database that alter the following:
Repetition factor
Percentage of rows returned from a range qualification (that is, your histogram information is incorrect)
For example, if you had run complete statistics on the empno column early in your company’s history, your repetition factor is correct because all employees still have unique employee numbers. If you used ranges of employee numbers in any way, as you added new employees your histogram information is less accurate.
If your company originally had 100 employees, 10% of the employees have employee numbers greater than 90. If the company hired an additional 100 employees, 55% of the employees have employee numbers greater than 90, but the original histogram information does not reflect this.
Columns that show this type of “receding end” growth and are used in range queries can periodically need to have optimization run on them (exact match on employee number is not affected, because the information that says all employee numbers are unique is still correct).
Even if the statistics are not up-to-date, the query results are still correct.
Note:  For Vector tables, the automatic histogram generation feature is a viable alternative to the frequent re-optimization of such columns. By not creating statistics at all on such columns, the query optimizer will automatically generate a histogram from current live data when the first use of the column in a WHERE or GROUP BY clause is detected in a server cycle. This will greatly simplify the management of statistics for such columns.