4. Stage 2--Migrate the Schema, Data, and Queries : Generating Statistics
Share this page                  
Generating Statistics
VectorH relies on reasonably up to date statistics to correctly optimize queries.
Statistics can be generated by:
The optimizedb command
The --stats option on the vwload utility
Automatically at server startup
The simplest approach is to generate statistics for all columns and all tables that the current user owns, which can be done as follows:
optimizedb <database name>
This will take some time for a large schema, so you can apply fine tuning. To speed up the operation and reduce the amount of statistics data generated, generate statistics only for columns involved in joins, restrictions, or grouping, using the ‑r -a optimizedb flags or table/column listing with CREATE STATISTICS.
On server startup, histograms are automatically generated on all columns that appear in WHERE clauses and do not already have a histogram stored in the catalog. A typical use strategy is to create histograms (with optimizedb or CREATE STATISTICS) for columns whose distribution does not change, and then let Vector generate the new histograms on the other, more dynamic, columns for every server cycle.
Statistics can be generated on a table-by-table (or even individual column) basis. In addition the CREATE STATISTICS command can create column correlation statistics using the WITH COLUMN GROUP clause.
Statistics should be generated after every significant change to the distribution of data in a database, including on first loading the database, or on subsequent data loads that change the distribution of key values. Out of date statistics is a common reason for queries not performing as expected, or for worsening performance as data is loaded over time.
For example, with the DBT-3 data set used in the performance test kit, query performance without statistics is much worse that performance after statistics have been created. Statistics can be deleted using:
statdump -zdl pocdb