Using the Query Optimizer
Data and Query Optimization
Vector uses a query optimizer to develop sophisticated query execution strategies. The query optimizer makes use of basic information such as row size, number of rows, primary key fields and indexes defined, and more specific data-related information such as the amount of data duplication in a column.
The data-related information is available for use by the query optimizer only after statistics (see
Database Statistics) have been generated for the database. Without knowing exactly what data you have stored in your table, the query optimizer can only guess what your data looks like.
Consider the following examples:
select * from emp where empno = 13;
select * from emp where sex = 'M';
In each query, the guess is that few rows can qualify. In the first query, this guess is probably correct because employee numbers are usually unique. In the second query, however, this guess is probably incorrect because a company typically has as many males as females.
Why do restricted assumptions about your query make a performance difference? For a single-table, keyed retrieval where you are specifying the key, there is probably no difference at all. The key is used to retrieve your data. However, in a multi-table query with several restrictions, knowing what your data looks like can help determine the best way to execute your query. The following example shows why:
select e.name, e.dept, b.address
from emp e, dept d, bldg b
where e.dept = d.dname
and d.bldg = b.bldg
and b.state = 'CA'
and e.salary = 50000;
There are many ways of executing this query. If appropriate keys exist, the probable choice is to execute the query in one of these two ways:
• Retrieve all the employees with a salary of 50000. Join the employees with a salary of 50000 to the department table, join the employees with their valid departments to the valid buildings. The tables are processed in the following order:
emp --> dept --> bldg
• Retrieve all the buildings with a state of CA. Join the valid buildings with the department table, and join the qualifying departments to the valid employees. The tables are processed in the following order:
bldg --> dept --> emp
The difference between these two possibilities is the order in which the tables are joined. Which method is preferable? Only if you knew exactly how many employees made $50,000, how many buildings were in California, and how many departments were in each building, can you pick the best strategy.
The best (that is, the fastest) query execution strategy can be determined only by having an idea of what your data looks like—how many rows qualify from the restriction, and how many rows join from table to table.
Query Execution Plans (QEPs) (see
Query Execution Plans), generated by the query optimizer each time you perform a query, illustrate how a query is executed. By optimizing your database, you can optimize the QEPs that are generated, thereby making your queries more efficient.