Hash Join Node
The hash is an optimized join that replaces the FSM join when one or both of the subtrees must be sorted. It functions by loading the rows of one subtree into a memory resident hash table, keyed on the values of the join columns. The rows of the other subtree are hashed on their join key values into the hash table, allowing very efficient matching of joined rows. By avoiding the sort(s) of the FSM join, the hash join can be much more efficient.
A typical abbreviated example of a QEP diagram involving a hash join is shown below:
Join
/ \
proj-rest proj-rest
/ /
table table
This node is displayed with the following information on a QEP diagram:
• A label identifying it as a hash join node, along with the columns on which join processing is done
• If an outer join has been requested, one of the following labels indicating the type of join:
[LEFT JOIN]
[FULL JOIN]
[RIGHT JOIN]
• Storage structure (which is usually heap) or a list of sort columns if the data is being sorted
• Total number of pages and tuples
• Query cost information
• Optionally, sort information (whether the data produced by this node is already sorted or requires sorting)
Like the FSM join, the hash join is most common when a bulk join takes place with no row restrictions on either table involved in the join, as with a SELECT statement of the following form:
SELECT FROM r1,r2 WHERE r1.joincol= r2.joincol;
This example shows a QEP diagram with hash join node resulting from such a bulk join:
SELECT a.col2, b.col2 FROM arel a, brel b
WHERE a.col1 = b.col2;
QUERY PLAN 1,1, no timeout, of main query
HASH Join(col1)
Heap
Pages 1 Tups 156
D9 C40
/ \
Proj-rest Proj-rest
Heap Heap
Pages 1 Tups 156 Pages 1 Tups 156
D8 C1 D1 C1
/ /
arel(a) brel (b)
Hashed (NU) Isam (NU)
Pages 70 Tups 156 Pages 3 Tups 156