Column Statistics

14. Using the Query Optimizer : Database Statistics : Column Statistics

Share this page

Column Statistics

Collecting statistics is generally a time-consuming process because a large amount of data must be scanned. The techniques described so far—except for Key Column Statistics—collect statistics on all columns of the indicated tables.

It is not necessary, however, to choose all columns in all tables in your database when optimizing. The query optimizer uses statistics on a column only if the column is needed to restrict data, if it is specified in a join, or if it is in a GROUP BY clause (to estimate the number of groups). Therefore, it is a good idea to limit creation of statistics only to those columns used in a WHERE or GROUP BY clause.

The DBA or table owner usually understands the table structure and content and is able to predict how the various columns are used in queries. Thus, someone familiar with the table can identify columns that are used in the WHERE or GROUP BY clause.

Given these queries:

select name, age from emp

where dept = 'Tech Support';

select e.name, e.salary, b.address

from emp e, bldg b, dept d

where e.dept = d.dname

and d.bldg = b.bldg;

Candidate columns for optimization are:

emp table:  dept
dept table:  dname, bldg
bldg table:  bldg

Based on their use in these sample queries, there is no reason to obtain statistics on employee name, age, salary, or building address. These columns are listed in the target list only, not the WHERE clause of the query.

Columns used in the WHERE clause are often indexed to speed up joins and execution of constraints. If this is the case, specify the Gen Statistics on Keys/Index option to create statistics on key (that is, indexed) columns. However, it is often equally as important to create statistics on non‑indexed columns referenced in WHERE clauses.