14. Using the Query Optimizer : Types of Nodes in a QEP : Join Nodes in a QEP
 
Share this page                  
Join Nodes in a QEP
There is an inner and an outer tree beneath every join node, which function similarly to an inner and outer program loop. By convention, the left-hand tree is called the outer tree, and the right-hand tree is called the inner tree.
There are various types of join nodes, described individually below, but the joining method is the same: for each row from the outer tree, there is a join to all of the rows that can possibly qualify from the inner tree. The next row from the outer tree is processed, and so on.
Any join node can have outer join information if an outer join is present.
Cartesian Product Node
The Cartesian product, or cart-prod, strictly follows the un-optimized join model, with each row in the outer node compared to all rows from the inner node. This does not mean that all rows are actually read, only that all rows that satisfy the conditions of the query are compared.
A typical abbreviated example of a QEP diagram involving a cart-prod is shown below:
       Cart-Prod
      /     \
proj-rest table
   /
table
This node is displayed with the following information on a QEP diagram:
A label identifying it as a cart-prod join node, along with the column(s) on which processing is done
If an outer join has been requested, one of the following labels indicating the type of join:
[LEFT JOIN]
[FULL JOIN]
[RIGHT JOIN]
Storage structure (which is usually heap)
Total number of pages and tuples
Query cost information
Optionally, sort information (whether the data produced by this node is already sorted or requires sorting)
The cart-prod is most frequently observed in disjoint queries (that is, when use of correlation variables and table names are mixed in the query). However, the following cases can also generate cart-prods, without adversely affecting performance:
Queries involving ORs that can usually be decomposed into a sequence of smaller queries
No join specification (a disjoint table or no WHERE clause, so that there is no relationship between the two tables)
Small tables or amounts of data
Non_equijoins, such as a query with the following WHERE clause:
where r1.col1 > r2.col2
Cart-prods are sometimes associated with substantial estimates for disk I/O usage and affect performance adversely.
This example shows a QEP diagram with a cart-prod join node resulting from the following simple disjoint query:
select arel.col1 from arel, arel a
  where a.col1 = 99;
QUERY PLAN 7,1, no timeout, of main query
                    Cart-Prod
                    Heap
                    Pages 1 Tups 243
                    D9 C4
                   /                \
          Proj-rest               Proj-rest
          Sorted(NU)              Heap
          Pages 1 Tups 2        Pages 1 Tups 156
          D1 C0                   D8 C1
          /                        /
     arel                      arel
     Hashed(col1)              Hashed(NU)
     Pages 70 Tups 156        Pages 70 Tups 156
Full Sort Merge Node
The full sort merge (FSM) is a more optimal join: it typically joins the inner and outer subtrees with many fewer comparisons than the cart-prod requires. This is done by assuring that both subtrees are sorted in the order of the join columns. If one or the other is not already sorted (for example, by being read from a B-tree index constructed on the join columns), the query plan can include a sort to put the rows in the correct order. This allows the rows of the outer subtree to be joined to the matching rows of the inner subtree with one pass over each. The inner subtree is not scanned multiple times, as with the cart-prod join.
A typical abbreviated example of a QEP diagram involving an FSM is shown below:
               join
              /    \
           sort    sort
            /       /
   proj-rest proj-rest
  /            /
table       table
This node is displayed with the following information on a QEP diagram:
A label identifying it as a FSM join node, along with the column(s) on which join processing is done
If an outer join has been requested, one of the following labels indicating the type of join:
[LEFT JOIN]
[FULL JOIN]
[RIGHT JOIN]
Storage structure (which is usually heap) or a list of sort columns if the data is being sorted
Total number of pages and tuples
Query cost information
Optionally, sort information (whether the data produced by this node is already sorted or requires sorting)
The FSM is most common when a “bulk” join takes place with no row restrictions on either table involved in the join, as with a SELECT statement of the following form:
select * from r1, r2 where r1.joincol = r2.joincol;
This example shows a QEP diagram with an FSM join node resulting from such a bulk join:
select a.col2, b.col2 from arel a, brel b
  where a.col1 = b.col1;
QUERY PLAN 5,1, no timeout, of main query
            FSM Join(col1)
            Heap
            Pages 1 Tups 156
            D9 C40
    /               \
    Proj-rest          Proj-rest
    Sorted(eno)      Sorted(eno)
    Pages 1 Tups 156   Pages 1 Tups 156
    D8 C1              D1 C1
   /                     /
arel                    brel
Hashed (NU)             Isam (NU)
Pages 70 Tups 156       Pages 3 Tups 156
Partial Sort Merge Node
The partial sort merge (PSM) is a cross between a full sort merge and a cart-prod. The inner tree must be in sorted order. The outer tree can be sorted or partially sorted. The outer tree in PSM scenarios can always be derived from an ISAM table. Comparisons proceed as for the full sort merge until an outer value is found to be out of order. At that point the inner loop is restarted. Because ISAM tables are reasonably well ordered (depending on how many rows have been added because the last reorganization), the number of inner loop restarts is typically small.
A typical abbreviated example of a QEP diagram involving a PSM is shown below:
           Join
          /      \
      proj-rest  sort
      /           /
    table     proj-rest
                  /
                table
This node is displayed with the following information on a QEP diagram:
A label identifying it as a PSM join node, along with the columns on which processing is done
If an outer join has been requested, one of the following labels indicating the type of join:
[LEFT JOIN]
[FULL JOIN]
[RIGHT JOIN]
Storage structure (which is usually heap)
Total number of pages and tuples
Query cost information
Optionally, sort information (whether the data produced by this node is already sorted or requires sorting)
The following example shows a QEP diagram with a PSM join:
select a.col2, b.col2 from arel a, brel b
  where a.col1 = b.col2;
QUERY PLAN 6,1, no timeout, of main query
                     PSM Join(col1)
                     Heap
                     Pages 1 Tups 156
                     D9 C26
                    /          \
           Proj-rest            Proj-rest
           Heap                 Sort on(col1)
           Pages 1 Tups 156     Pages 1 Tups 156
           D1 C1                D8 C1
          /                     /
       brel                  arel
       Isam(col2)            Hashed(NU)
       Pages 3 Tups 156      Pages 70 Tups 156
Hash Join Node
The hash is an optimized join that replaces the FSM join when one or both of the subtrees must be sorted. It functions by loading the rows of one subtree into a memory resident hash table, keyed on the values of the join columns. The rows of the other subtree are hashed on their join key values into the hash table, allowing very efficient matching of joined rows. By avoiding the sort(s) of the FSM join, the hash join can be much more efficient.
A typical abbreviated example of a QEP diagram involving a hash join is shown below:
           Join
          /      \
      proj-rest  proj-rest
      /           /
    table     table
This node is displayed with the following information on a QEP diagram:
A label identifying it as a hash join node, along with the columns on which join processing is done
If an outer join has been requested, one of the following labels indicating the type of join:
[LEFT JOIN]
[FULL JOIN]
[RIGHT JOIN]
Storage structure (which is usually heap) or a list of sort columns if the data is being sorted
Total number of pages and tuples
Query cost information
Optionally, sort information (whether the data produced by this node is already sorted or requires sorting)
Like the FSM join, the hash join is most common when a bulk join takes place with no row restrictions on either table involved in the join, as with a SELECT statement of the following form:
select from r1,r2 where r1.joincol= r2.joincol;
This example shows a QEP diagram with hash join node resulting from such a bulk join:
select a.col2, b.col2 from arel a, brel b
where a.col1 = b.col2;
 
QUERY PLAN 1,1, no timeout, of main query
                     HASH Join(col1)
                     Heap
                     Pages 1 Tups 156
                     D9 C40
                    /             \
           Proj-rest            Proj-rest
  Heap         Heap 
           Pages 1 Tups 156     Pages 1 Tups 156
           D8 C1                D1 C1
          /                     /
       arel(a)                brel (b)
       Hashed (NU)           Isam (NU)
       Pages 70 Tups 156     Pages 3 Tups 156
Key and Tid Lookup Join Node
In key and tid lookup joins, the outer and inner data set is not static. For each outer row, the join selects values and forms a key to search in the inner join. A key lookup join uses keyed access through the structure of the inner table or index, and a tid lookup join uses the tuple identifier (tid) value.
A typical abbreviated example of a QEP diagram involving this type of join is shown below:
         Join
       /     \
    sort      btree (or hash or isam)
       \      
      proj-rest
         \
       table
This node is displayed with the following information on a QEP diagram:
A label identifying it as a key (K) or tid (T) lookup join, along with the column(s) on which processing is done
If an outer join has been requested, the following label indicating the type of join:
[LEFT JOIN]
Storage structure (which is usually heap)
Total number of pages and tuples
Query cost information
Optionally, sort information (whether the data produced by this node is already sorted or requires sorting)
This case is seen most frequently where the outer subtree proj-rest returns few rows, so it is faster to do a keyed lookup (on an ISAM, hash, or B-tree table) than any of the sort merge operations.
The following example shows a QEP diagram with a key lookup join:
select b.col1, b.col2, a.col2
  from arel a, brel b
  where a.col3 = 'x' and a.col1 = b.col1;
                  K Join(col1)
                  Heap
                  Pages 1 Tups 2
                  D3 C0
                   /           \
           Proj-rest            brel
           Heap                Isam(col1)
           Pages 1 Tups 2       Pages 3 Tups 156
           D1 C0
            /
       arel
       Hashed(NU)
       Pages 70 Tups 156
In the next example of a tid lookup join, access to the base table is through the secondary index, and proj-rest collects tids for sorting. The tids are used for a direct tid lookup in the base table. Therefore, the storage structure of the base table is NU:
select a.col1, a.col2 from arel a
  where a.col2 = 99
  order by a.col2;
                   Sort(col2)
                   Pages 1 Tups 1
                   D4 C1
                   /
             T Join(tidp)
             Heap
             Pages 1 Tups 1
             D4 C0
            /               \
         Proj-rest          arel
         Sort on(tidp)      Hashed(NU)
         Pages 1 Tups 1     Pages 70 Tups 156
         D2 C0
        /
   aindex
   Isam(col2)
   Pages 2 Tups 156
Subquery Join Node
The subquery join is specific to SQL because SQL allows subselects as part of a query. These nodes join rows from a query to matching rows of a contained subselect, thus allowing the subselect restrictions on the query to be evaluated.
A typical abbreviated example of a QEP diagram involving a subquery join is shown below:
              SE join
              /     \
       proj-rest     Tn
         /
    table
In this diagram, Tn identifies the QEP constructed to evaluate the subselect.
This node is displayed with the following information on a QEP diagram:
A label identifying it as a subquery (SE) join, along with the column(s) on which processing is done
Storage structure (which is usually heap)
Total number of pages and tuples
Query cost information
Optionally, sort information (whether the data produced by this node is already sorted or requires sorting)
The following example shows a QEP diagram with a subquery join:
select * from arel a
  where a.col2 = (
    select col2 from brel b
      where a.col1 = b.col1)
  and col1 = 5;
QUERY PLAN 3,1, no timeout, of T1
               Proj-rest
               Heap
               Pages 1 Tups 1
               D1 C0
               /
          brel
          Hashed(col1)
          Pages 3 Tups 156
QUERY PLAN 4,2, no timeout, of main query
               SE Join(col1)
               Heap
               Pages 1 Tups 1
               D2 C0
              /              \
          Proj-rest          T1
          Heap               Heap
          Pages 1 Tups 1     Pages 1 Tups 1
          D1 C0
          /
      arel
      Hashed(col1)
      Pages 70 Tups 156
In the QEP pane of the SQL Scratchpad window in VDBA, these two QEPs are shown in separate tabs.
Subquery joins are reasonably expensive because the subselect must be executed once for each row of the outer subtree. The query optimizer contains logic to flatten various types of subselects into normal joins with the containing query. This allows more optimal joins to be used to evaluate the subselect.
As an example of the subselect flattening enhancement features of the query optimizer, consider the following subselect:
select r.a from r where r.c =
 (select avg(s.a) from s
where s.b = r.b and r.d > s.d);
Instead of two scans of table r, the query optimizer has eliminated a scan of table r by evaluating the aggregate at an interior QEP node. The QEP appears similar to the previous example:
QUERY PLAN 7,1, no timeout, of main query
                     Hash Join(b)
                     avg(s.a)
                     Heap
                     Pages 2 Tups 359
                     D133 C171
                    /          \
             Proj-rest         Proj-rest
             Sorted(b)         Heap
             Pages 1 Tups 359  Pages 40 Tups 359
             D65 C112          D32 C3
            /                  /
          r                   s
          Btree (b,c)         Hashed(NU)
          Pages 44 Tups 359   Pages 44 Tups 359