Join Nodes in a QEP

14. Using the Query Optimizer : Types of Nodes in a QEP : Join Nodes in a QEP

Share this page

Join Nodes in a QEP

There is an inner and an outer tree beneath every join node, which function similarly to an inner and outer program loop. By convention, the left-hand tree is called the outer tree, and the right-hand tree is called the inner tree.

There are various types of join nodes, described individually below, but the joining method is the same: for each row from the outer tree, there is a join to all of the rows that can possibly qualify from the inner tree. The next row from the outer tree is processed, and so on.

Any join node can have outer join information if an outer join is present.

Cartesian Product Node

The Cartesian product, or cart-prod, strictly follows the un-optimized join model, with each row in the outer node compared to all rows from the inner node. This does not mean that all rows are actually read, only that all rows that satisfy the conditions of the query are compared.

A typical abbreviated example of a QEP diagram involving a cart-prod is shown below:

Cart-Prod

/ \

proj-rest table

table

This node is displayed with the following information on a QEP diagram:

• A label identifying it as a cart-prod join node, along with the column(s) on which processing is done

• If an outer join has been requested, one of the following labels indicating the type of join:

[LEFT JOIN]
[FULL JOIN]
[RIGHT JOIN]

• Storage structure (which is usually heap)

• Total number of pages and tuples

• Query cost information

• Optionally, sort information (whether the data produced by this node is already sorted or requires sorting)

The cart-prod is most frequently observed in disjoint queries (that is, when use of correlation variables and table names are mixed in the query). However, the following cases can also generate cart-prods, without adversely affecting performance:

• Queries involving ORs that can usually be decomposed into a sequence of smaller queries

• No join specification (a disjoint table or no WHERE clause, so that there is no relationship between the two tables)

• Small tables or amounts of data

• Non_equijoins, such as a query with the following WHERE clause:

where r1.col1 > r2.col2

Cart-prods are sometimes associated with substantial estimates for disk I/O usage and affect performance adversely.

This example shows a QEP diagram with a cart-prod join node resulting from the following simple disjoint query:

select arel.col1 from arel, arel a

where a.col1 = 99;

QUERY PLAN 7,1, no timeout, of main query

Cart-Prod

Heap

Pages 1 Tups 243

D9 C4

/ \

Proj-rest Proj-rest

Sorted(NU) Heap

Pages 1 Tups 2 Pages 1 Tups 156

D1 C0 D8 C1

/ /

arel arel

Hashed(col1) Hashed(NU)

Pages 70 Tups 156 Pages 70 Tups 156

Full Sort Merge Node

The full sort merge (FSM) is a more optimal join: it typically joins the inner and outer subtrees with many fewer comparisons than the cart-prod requires. This is done by assuring that both subtrees are sorted in the order of the join columns. If one or the other is not already sorted (for example, by being read from a B-tree index constructed on the join columns), the query plan can include a sort to put the rows in the correct order. This allows the rows of the outer subtree to be joined to the matching rows of the inner subtree with one pass over each. The inner subtree is not scanned multiple times, as with the cart-prod join.

A typical abbreviated example of a QEP diagram involving an FSM is shown below:

join

/ \

sort sort

/ /

proj-rest proj-rest

/ /

table table

This node is displayed with the following information on a QEP diagram:

• A label identifying it as a FSM join node, along with the column(s) on which join processing is done

• If an outer join has been requested, one of the following labels indicating the type of join:

[LEFT JOIN]
[FULL JOIN]
[RIGHT JOIN]

• Storage structure (which is usually heap) or a list of sort columns if the data is being sorted

• Total number of pages and tuples

• Query cost information

• Optionally, sort information (whether the data produced by this node is already sorted or requires sorting)

The FSM is most common when a “bulk” join takes place with no row restrictions on either table involved in the join, as with a SELECT statement of the following form:

select * from r1, r2 where r1.joincol = r2.joincol;

This example shows a QEP diagram with an FSM join node resulting from such a bulk join:

select a.col2, b.col2 from arel a, brel b

where a.col1 = b.col1;

QUERY PLAN 5,1, no timeout, of main query

            FSM Join(col1)
            Heap
            Pages 1 Tups 156
            D9 C40
    /               \
    Proj-rest          Proj-rest
    Sorted(eno)      Sorted(eno)
    Pages 1 Tups 156   Pages 1 Tups 156
    D8 C1              D1 C1
   /                     /
arel                    brel
Hashed (NU)             Isam (NU)
Pages 70 Tups 156       Pages 3 Tups 156

Partial Sort Merge Node

The partial sort merge (PSM) is a cross between a full sort merge and a cart-prod. The inner tree must be in sorted order. The outer tree can be sorted or partially sorted. The outer tree in PSM scenarios can always be derived from an ISAM table. Comparisons proceed as for the full sort merge until an outer value is found to be out of order. At that point the inner loop is restarted. Because ISAM tables are reasonably well ordered (depending on how many rows have been added because the last reorganization), the number of inner loop restarts is typically small.

A typical abbreviated example of a QEP diagram involving a PSM is shown below:

Join

/ \

proj-rest sort

/ /

table proj-rest

table

This node is displayed with the following information on a QEP diagram:

• A label identifying it as a PSM join node, along with the columns on which processing is done

• If an outer join has been requested, one of the following labels indicating the type of join:

[LEFT JOIN]
[FULL JOIN]
[RIGHT JOIN]

• Storage structure (which is usually heap)

• Total number of pages and tuples

• Query cost information

• Optionally, sort information (whether the data produced by this node is already sorted or requires sorting)

The following example shows a QEP diagram with a PSM join:

select a.col2, b.col2 from arel a, brel b

where a.col1 = b.col2;

QUERY PLAN 6,1, no timeout, of main query

PSM Join(col1)

Heap

Pages 1 Tups 156

D9 C26

/ \

Proj-rest Proj-rest

Heap Sort on(col1)

Pages 1 Tups 156 Pages 1 Tups 156

D1 C1 D8 C1

/ /

brel arel

Isam(col2) Hashed(NU)

Pages 3 Tups 156 Pages 70 Tups 156

Hash Join Node

The hash is an optimized join that replaces the FSM join when one or both of the subtrees must be sorted. It functions by loading the rows of one subtree into a memory resident hash table, keyed on the values of the join columns. The rows of the other subtree are hashed on their join key values into the hash table, allowing very efficient matching of joined rows. By avoiding the sort(s) of the FSM join, the hash join can be much more efficient.

A typical abbreviated example of a QEP diagram involving a hash join is shown below:

Join

/ \

proj-rest proj-rest

/ /

table table

This node is displayed with the following information on a QEP diagram:

• A label identifying it as a hash join node, along with the columns on which join processing is done

• If an outer join has been requested, one of the following labels indicating the type of join:

[LEFT JOIN]
[FULL JOIN]
[RIGHT JOIN]

• Storage structure (which is usually heap) or a list of sort columns if the data is being sorted

• Total number of pages and tuples

• Query cost information

• Optionally, sort information (whether the data produced by this node is already sorted or requires sorting)

Like the FSM join, the hash join is most common when a bulk join takes place with no row restrictions on either table involved in the join, as with a SELECT statement of the following form:

select from r1,r2 where r1.joincol= r2.joincol;

This example shows a QEP diagram with hash join node resulting from such a bulk join:

select a.col2, b.col2 from arel a, brel b

where a.col1 = b.col2;

QUERY PLAN 1,1, no timeout, of main query

HASH Join(col1)

Heap

Pages 1 Tups 156

D9 C40

/ \

Proj-rest Proj-rest

Heap Heap

Pages 1 Tups 156 Pages 1 Tups 156

D8 C1 D1 C1

/ /

arel(a) brel (b)

Hashed (NU) Isam (NU)

Pages 70 Tups 156 Pages 3 Tups 156

Key and Tid Lookup Join Node

In key and tid lookup joins, the outer and inner data set is not static. For each outer row, the join selects values and forms a key to search in the inner join. A key lookup join uses keyed access through the structure of the inner table or index, and a tid lookup join uses the tuple identifier (tid) value.

A typical abbreviated example of a QEP diagram involving this type of join is shown below:

Join

/ \

sort btree (or hash or isam)

proj-rest

table

This node is displayed with the following information on a QEP diagram:

• A label identifying it as a key (K) or tid (T) lookup join, along with the column(s) on which processing is done

• If an outer join has been requested, the following label indicating the type of join:

[LEFT JOIN]

• Storage structure (which is usually heap)

• Total number of pages and tuples

• Query cost information

• Optionally, sort information (whether the data produced by this node is already sorted or requires sorting)

This case is seen most frequently where the outer subtree proj-rest returns few rows, so it is faster to do a keyed lookup (on an ISAM, hash, or B-tree table) than any of the sort merge operations.

The following example shows a QEP diagram with a key lookup join:

select b.col1, b.col2, a.col2

from arel a, brel b

where a.col3 = 'x' and a.col1 = b.col1;

K Join(col1)

Heap

Pages 1 Tups 2

D3 C0

/ \

Proj-rest brel

Heap Isam(col1)

Pages 1 Tups 2 Pages 3 Tups 156

D1 C0

arel

Hashed(NU)

Pages 70 Tups 156

In the next example of a tid lookup join, access to the base table is through the secondary index, and proj-rest collects tids for sorting. The tids are used for a direct tid lookup in the base table. Therefore, the storage structure of the base table is NU:

select a.col1, a.col2 from arel a

where a.col2 = 99

order by a.col2;

Sort(col2)

Pages 1 Tups 1

D4 C1

T Join(tidp)

Heap

Pages 1 Tups 1

D4 C0

/ \

Proj-rest arel

Sort on(tidp) Hashed(NU)

Pages 1 Tups 1 Pages 70 Tups 156

D2 C0

aindex

Isam(col2)

Pages 2 Tups 156

Subquery Join Node

The subquery join is specific to SQL because SQL allows subselects as part of a query. These nodes join rows from a query to matching rows of a contained subselect, thus allowing the subselect restrictions on the query to be evaluated.

A typical abbreviated example of a QEP diagram involving a subquery join is shown below:

SE join

/ \

proj-rest Tn

table

In this diagram, Tn identifies the QEP constructed to evaluate the subselect.

This node is displayed with the following information on a QEP diagram:

• A label identifying it as a subquery (SE) join, along with the column(s) on which processing is done

• Storage structure (which is usually heap)

• Total number of pages and tuples

• Query cost information

• Optionally, sort information (whether the data produced by this node is already sorted or requires sorting)

The following example shows a QEP diagram with a subquery join:

select * from arel a

where a.col2 = (

select col2 from brel b

where a.col1 = b.col1)

and col1 = 5;

QUERY PLAN 3,1, no timeout, of T1

Proj-rest

Heap

Pages 1 Tups 1

D1 C0

brel

Hashed(col1)

Pages 3 Tups 156

QUERY PLAN 4,2, no timeout, of main query

SE Join(col1)

Heap

Pages 1 Tups 1

D2 C0

/ \

Proj-rest T1

Heap Heap

Pages 1 Tups 1 Pages 1 Tups 1

D1 C0

arel

Hashed(col1)

Pages 70 Tups 156

In the QEP pane of the SQL Scratchpad window in VDBA, these two QEPs are shown in separate tabs.

Subquery joins are reasonably expensive because the subselect must be executed once for each row of the outer subtree. The query optimizer contains logic to flatten various types of subselects into normal joins with the containing query. This allows more optimal joins to be used to evaluate the subselect.

As an example of the subselect flattening enhancement features of the query optimizer, consider the following subselect:

select r.a from r where r.c =

(select avg(s.a) from s

where s.b = r.b and r.d > s.d);

Instead of two scans of table r, the query optimizer has eliminated a scan of table r by evaluating the aggregate at an interior QEP node. The QEP appears similar to the previous example:

QUERY PLAN 7,1, no timeout, of main query

Hash Join(b)

avg(s.a)

Heap

Pages 2 Tups 359

D133 C171

/ \

Proj-rest Proj-rest

Sorted(b) Heap

Pages 1 Tups 359 Pages 40 Tups 359

D65 C112 D32 C3

/ /

r s

Btree (b,c) Hashed(NU)

Pages 44 Tups 359 Pages 44 Tups 359