Join Nodes in a QEP
There is an inner and an outer tree beneath every join node, which function similarly to an inner and outer program loop. By convention, the left-hand tree is called the outer tree, and the right-hand tree is called the inner tree.
There are various types of join nodes, described individually below, but the joining method is the same: for each row from the outer tree, there is a join to all of the rows that can possibly qualify from the inner tree. The next row from the outer tree is processed, and so on.
Any join node can have outer join information if an outer join is present.
Cartesian Product Node
The Cartesian product, or cart-prod, strictly follows the un-optimized join model, with each row in the outer node compared to all rows from the inner node. This does not mean that all rows are actually read, only that all rows that satisfy the conditions of the query are compared.
A typical abbreviated example of a QEP diagram involving a cart-prod is shown below:
Cart-Prod
/ \
proj-rest table
/
table
This node is displayed with the following information on a QEP diagram:
• A label identifying it as a cart-prod join node, along with the column(s) on which processing is done
• If an outer join has been requested, one of the following labels indicating the type of join:
[LEFT JOIN]
[FULL JOIN]
[RIGHT JOIN]
• Storage structure (which is usually heap)
• Total number of pages and tuples
• Query cost information
• Optionally, sort information (whether the data produced by this node is already sorted or requires sorting)
The cart-prod is most frequently observed in disjoint queries (that is, when use of correlation variables and table names are mixed in the query). However, the following cases can also generate cart-prods, without adversely affecting performance:
• Queries involving ORs that can usually be decomposed into a sequence of smaller queries
• No join specification (a disjoint table or no WHERE clause, so that there is no relationship between the two tables)
• Small tables or amounts of data
• Non_equijoins, such as a query with the following WHERE clause:
WHERE r1.col1 > r2.col2
Cart-prods are sometimes associated with substantial estimates for disk I/O usage and affect performance adversely.
This example shows a QEP diagram with a cart-prod join node resulting from the following simple disjoint query:
SELECT arel.col1 FROM arel, arel a
WHERE a.col1 = 99;
QUERY PLAN 7,1, no timeout, of main query
Cart-Prod
Heap
Pages 1 Tups 243
D9 C4
/ \
Proj-rest Proj-rest
Sorted(NU) Heap
Pages 1 Tups 2 Pages 1 Tups 156
D1 C0 D8 C1
/ /
arel arel
Hashed(col1) Hashed(NU)
Pages 70 Tups 156 Pages 70 Tups 156
Full Sort Merge Node
The full sort merge (FSM) is a more optimal join: it typically joins the inner and outer subtrees with many fewer comparisons than the cart-prod requires. This is done by assuring that both subtrees are sorted in the order of the join columns. If one or the other is not already sorted (for example, by being read from a B-tree index constructed on the join columns), the query plan can include a sort to put the rows in the correct order. This allows the rows of the outer subtree to be joined to the matching rows of the inner subtree with one pass over each. The inner subtree is not scanned multiple times, as with the cart-prod join.
A typical abbreviated example of a QEP diagram involving an FSM is shown below:
join
/ \
sort sort
/ /
proj-rest proj-rest
/ /
table table
This node is displayed with the following information on a QEP diagram:
• A label identifying it as a FSM join node, along with the column(s) on which join processing is done
• If an outer join has been requested, one of the following labels indicating the type of join:
[LEFT JOIN]
[FULL JOIN]
[RIGHT JOIN]
• Storage structure (which is usually heap) or a list of sort columns if the data is being sorted
• Total number of pages and tuples
• Query cost information
• Optionally, sort information (whether the data produced by this node is already sorted or requires sorting)
The FSM is most common when a “bulk” join takes place with no row restrictions on either table involved in the join, as with a SELECT statement of the following form:
SELECT * FROM r1, r2 WHERE r1.joincol = r2.joincol;
This example shows a QEP diagram with an FSM join node resulting from such a bulk join:
SELECT a.col2, b.col2 FROM arel a, brel b
WHERE a.col1 = b.col1;
QUERY PLAN 5,1, no timeout, of main query
FSM Join(col1)
Heap
Pages 1 Tups 156
D9 C40
/ \
Proj-rest Proj-rest
Sorted(eno) Sorted(eno)
Pages 1 Tups 156 Pages 1 Tups 156
D8 C1 D1 C1
/ /
arel brel
Hashed (NU) Isam (NU)
Pages 70 Tups 156 Pages 3 Tups 156
Partial Sort Merge Node
The partial sort merge (PSM) is a cross between a full sort merge and a cart-prod. The inner tree must be in sorted order. The outer tree can be sorted or partially sorted. The outer tree in PSM scenarios can always be derived from an ISAM table. Comparisons proceed as for the full sort merge until an outer value is found to be out of order. At that point the inner loop is restarted. Because ISAM tables are reasonably well ordered (depending on how many rows have been added because the last reorganization), the number of inner loop restarts is typically small.
A typical abbreviated example of a QEP diagram involving a PSM is shown below:
Join
/ \
proj-rest sort
/ /
table proj-rest
/
table
This node is displayed with the following information on a QEP diagram:
• A label identifying it as a PSM join node, along with the columns on which processing is done
• If an outer join has been requested, one of the following labels indicating the type of join:
[LEFT JOIN]
[FULL JOIN]
[RIGHT JOIN]
• Storage structure (which is usually heap)
• Total number of pages and tuples
• Query cost information
• Optionally, sort information (whether the data produced by this node is already sorted or requires sorting)
The following example shows a QEP diagram with a PSM join:
SELECT a.col2, b.col2 FROM arel a, brel b
WHERE a.col1 = b.col2;
QUERY PLAN 6,1, no timeout, of main query
PSM Join(col1)
Heap
Pages 1 Tups 156
D9 C26
/ \
Proj-rest Proj-rest
Heap Sort on(col1)
Pages 1 Tups 156 Pages 1 Tups 156
D1 C1 D8 C1
/ /
brel arel
Isam(col2) Hashed(NU)
Pages 3 Tups 156 Pages 70 Tups 156
Hash Join Node
The hash is an optimized join that replaces the FSM join when one or both of the subtrees must be sorted. It functions by loading the rows of one subtree into a memory resident hash table, keyed on the values of the join columns. The rows of the other subtree are hashed on their join key values into the hash table, allowing very efficient matching of joined rows. By avoiding the sort(s) of the FSM join, the hash join can be much more efficient.
A typical abbreviated example of a QEP diagram involving a hash join is shown below:
Join
/ \
proj-rest proj-rest
/ /
table table
This node is displayed with the following information on a QEP diagram:
• A label identifying it as a hash join node, along with the columns on which join processing is done
• If an outer join has been requested, one of the following labels indicating the type of join:
[LEFT JOIN]
[FULL JOIN]
[RIGHT JOIN]
• Storage structure (which is usually heap) or a list of sort columns if the data is being sorted
• Total number of pages and tuples
• Query cost information
• Optionally, sort information (whether the data produced by this node is already sorted or requires sorting)
Like the FSM join, the hash join is most common when a bulk join takes place with no row restrictions on either table involved in the join, as with a SELECT statement of the following form:
SELECT FROM r1,r2 WHERE r1.joincol= r2.joincol;
This example shows a QEP diagram with hash join node resulting from such a bulk join:
SELECT a.col2, b.col2 FROM arel a, brel b
WHERE a.col1 = b.col2;
QUERY PLAN 1,1, no timeout, of main query
HASH Join(col1)
Heap
Pages 1 Tups 156
D9 C40
/ \
Proj-rest Proj-rest
Heap Heap
Pages 1 Tups 156 Pages 1 Tups 156
D8 C1 D1 C1
/ /
arel(a) brel (b)
Hashed (NU) Isam (NU)
Pages 70 Tups 156 Pages 3 Tups 156
Key and Tid Lookup Join Node
In key and tid lookup joins, the outer and inner data set is not static. For each outer row, the join selects values and forms a key to search in the inner join. A key lookup join uses keyed access through the structure of the inner table or index, and a tid lookup join uses the tuple identifier (tid) value.
A typical abbreviated example of a QEP diagram involving this type of join is shown below:
Join
/ \
sort btree (or hash or isam)
\
proj-rest
\
table
This node is displayed with the following information on a QEP diagram:
• A label identifying it as a key (K) or tid (T) lookup join, along with the column(s) on which processing is done
• If an outer join has been requested, the following label indicating the type of join:
[LEFT JOIN]
• Storage structure (which is usually heap)
• Total number of pages and tuples
• Query cost information
• Optionally, sort information (whether the data produced by this node is already sorted or requires sorting)
This case is seen most frequently where the outer subtree proj-rest returns few rows, so it is faster to do a keyed lookup (on an ISAM, hash, or B-tree table) than any of the sort merge operations.
The following example shows a QEP diagram with a key lookup join:
SELECT b.col1, b.col2, a.col2
FROM arel a, brel b
WHERE a.col3 = 'x' AND a.col1 = b.col1;
K Join(col1)
Heap
Pages 1 Tups 2
D3 C0
/ \
Proj-rest brel
Heap Isam(col1)
Pages 1 Tups 2 Pages 3 Tups 156
D1 C0
/
arel
Hashed(NU)
Pages 70 Tups 156
In the next example of a tid lookup join, access to the base table is through the secondary index, and proj-rest collects tids for sorting. The tids are used for a direct tid lookup in the base table. Therefore, the storage structure of the base table is NU:
SELECT a.col1, a.col2 FROM arel a
WHERE a.col2 = 99
ORDER BY a.col2;
Sort(col2)
Pages 1 Tups 1
D4 C1
/
T Join(tidp)
Heap
Pages 1 Tups 1
D4 C0
/ \
Proj-rest arel
Sort on(tidp) Hashed(NU)
Pages 1 Tups 1 Pages 70 Tups 156
D2 C0
/
aindex
Isam(col2)
Pages 2 Tups 156
Subquery Join Node
The subquery join is specific to SQL because SQL allows subselects as part of a query. These nodes join rows from a query to matching rows of a contained subselect, thus allowing the subselect restrictions on the query to be evaluated.
A typical abbreviated example of a QEP diagram involving a subquery join is shown below:
SE join
/ \
proj-rest Tn
/
table
In this diagram, Tn identifies the QEP constructed to evaluate the subselect.
This node is displayed with the following information on a QEP diagram:
• A label identifying it as a subquery (SE) join, along with the column(s) on which processing is done
• Storage structure (which is usually heap)
• Total number of pages and tuples
• Query cost information
• Optionally, sort information (whether the data produced by this node is already sorted or requires sorting)
The following example shows a QEP diagram with a subquery join:
SELECT * FROM arel a
WHERE a.col2 = (
SELECT col2 FROM brel b
WHERE a.col1 = b.col1)
AND col1 = 5;
QUERY PLAN 3,1, no timeout, of T1
Proj-rest
Heap
Pages 1 Tups 1
D1 C0
/
brel
Hashed(col1)
Pages 3 Tups 156
QUERY PLAN 4,2, no timeout, of main query
SE Join(col1)
Heap
Pages 1 Tups 1
D2 C0
/ \
Proj-rest T1
Heap Heap
Pages 1 Tups 1 Pages 1 Tups 1
D1 C0
/
arel
Hashed(col1)
Pages 70 Tups 156
In the QEP pane of the SQL Scratchpad window in VDBA, these two QEPs are shown in separate tabs.
Subquery joins are reasonably expensive because the subselect must be executed once for each row of the outer subtree. The query optimizer contains logic to flatten various types of subselects into normal joins with the containing query. This allows more optimal joins to be used to evaluate the subselect.
As an example of the subselect flattening enhancement features of the query optimizer, consider the following subselect:
SELECT r.a FROM r WHERE r.c =
(SELECT AVG(s.a) FROM s
WHERE s.b = r.b AND r.d > s.d);
Instead of two scans of table r, the query optimizer has eliminated a scan of table r by evaluating the aggregate at an interior QEP node. The QEP appears similar to the previous example:
QUERY PLAN 7,1, no timeout, of main query
Hash Join(b)
avg(s.a)
Heap
Pages 2 Tups 359
D133 C171
/ \
Proj-rest Proj-rest
Sorted(b) Heap
Pages 1 Tups 359 Pages 40 Tups 359
D65 C112 D32 C3
/ /
r s
Btree (b,c) Hashed(NU)
Pages 44 Tups 359 Pages 44 Tups 359