Was this helpful?
Composite Histograms
The optimizer usually calculates costs from statistics on individual columns. However, it is possible for Vector to create and use histograms created from the concatenation of several columns. Such histograms are called composite histograms.
Composite histograms are useful in ad hoc query applications in which there are WHERE clause restrictions on varying combinations of columns. Such applications can have a variety of secondary indexes constructed on different permutations of the same columns with the goal of allowing the query optimizer to pick an index tailored to the specific combination of restrictions used in any one query.
For example, consider a table X with columns A, B, C, D, E, etc. and secondary indexes defined on (A, B, C), (B, C, D), (B, A, E). Consider a query with a WHERE clause such as “A = 25 and B = 30 and E = 99”. With histograms on the individual columns, the query optimizer finds it difficult to differentiate the cost of solving the query using the (A, B, C) index and the (B, A, E) index. This is because of the technique used to determine the combined effect of several restrictions on the same table. However, with composite histograms defined on each index, the optimizer combines the three restrictions into a single restriction on the concatenated key values, and the (B, A, E) index clearly produces the best looking query plan.
Composite histograms can be created on any collection of two or more columns. The columns can be, but need not be, storage structure or secondary index key columns. Any one column can participate in multiple composite histograms.
When optimizing a query, the optimizer looks for the longest (that is, the most columns) composite histogram whose leading columns are equality predicates in the query. All of the columns in that histogram are then marked "statistics found," and the process repeats with remaining columns in the query.
Finally, all remaining columns not already covered by a composite histogram are estimated using standard single-column statistics.
 
Last modified date: 11/09/2022