Statistics and Global Temporary Tables
Because global temporary tables only exist for the duration of a session, Optimize Database cannot be used to gather statistical information about them. Without histograms, the query optimizer has no knowledge about the value distributions of the columns in a global temporary table. Ingres maintains a reasonably accurate row count for global temporary tables, and this row count can be used by the query optimizer to compile a query which accesses a global temporary table.
The row counts alone are usually enough to permit the compilation of efficient query plans from queries that reference global temporary tables, in particular because they often contain relatively small data volumes. The lack of histograms on global temporary tables, however, can cause poor estimates of the number of rows that result from the application of restriction or join predicates. These poor estimates can in turn cause the generation of inefficient query plans. Inefficient query plans typically occur with large global temporary tables or tables with columns having skewed value distributions, which are not handled well by the default estimation algorithms of the query optimizer.
To help deal with such situations, there is a mechanism available to associate “model” histograms with global temporary tables.
How to Associate “Model” Histograms with Global Temporary Tables
Associating “model” histograms with global temporary tables can help alleviate the generation of inefficient query plans that can typically occur with large global temporary tables or tables with columns having skewed value distributions.
1. To associate “model” histograms with global temporary tables, follow these steps:
2. Create a persistent table with the same name as the global temporary table being modeled. The schema qualifier for the table must be either the user ID of the executing user of the application creating and accessing the global temporary table, or the special user ID “_gtt_model”. Its column definitions must include at least those from the global temporary table for which histograms are to be built. The column names and types must exactly match those of the global temporary table.
3. Populate the persistent table with a set of rows, which is representative of a typical instance of the global temporary table.
4. Run optimizedb on those columns of the persistent table for which histograms are desired (typically, the columns contained in WHERE clauses in any referencing queries).
5. After the histograms have been built, the persistent table can be emptied of rows, to release the space it occupies. This must be done with a DELETE FROM xxx statement, to delete the rows but leave the catalog definition (and histograms).
When the query optimizer analyzes WHERE clause predicates with columns from a global temporary table, it looks for the catalog definition of a similarly named persistent table with a schema qualifier matching the ID of the executing user or _gtt_model. If one is found, it looks for histograms on similarly named columns whose type and length exactly match those of the global temporary table columns. If these conditions are satisfied, it uses the model histograms.
Not all faulty query plans involving global temporary tables can be improved this way. The modeling technique depends on the fact that all or most instances of the global temporary table have similar value distributions in the histogrammed columns. If this is not true, a single instance of the table (as with the model persistent table) will not be representative of them all, and can improve the query plans in some executions of the application, but degrade other executions.
Last modified date: 08/29/2024