10. Choosing Storage Structures and Secondary Indexes : Hash Storage Structure : Structure of a Hash Table
 
Share this page                  
Structure of a Hash Table
An example illustrates how a hash table is structured and what hashing means:
The example uses an employee table that has 31 rows, 500 bytes each.
The table is modified to hash on the age field, using the Structure of Table dialog. For a full description of modify procedures, see the chapter “Maintaining Storage Structures.” You can also use this MODIFY statement:
modify employee to hash on age;
The number of main pages needed is calculated. The number chosen is always at least seven, no matter how small the table is. The number of main pages chosen is approximately twice the number of pages required if the table were a heap. Normally hash uses a 50% fill factor, although if the row width is greater than 1000 bytes, it uses a 100% fill factor.
The calculation used is this:
Main pages = (Rows_in_Table / Rows_per_page) * 2
31 rows_in_table / 4 rows_per_page = 8 (round up)
   8 * 2 = 16;
Main pages for employee table = 16
The main pages calculation is checked against the Min Pages and Max Pages values. If these were specified, the result must fall in this range.
When a table is modified to hash, a skeletal table is set up with an appropriate number of main pages. Although 16 pages can actually be used, as shown in the calculation above, for illustration purposes assume 10 main pages are chosen. The table is built by placing each row on the page where its key hashes.
The chart in the following example illustrates what a table looks like after modifying to hash on age. Remember that the actual hashing function is a complex algorithm that supports all of the data types. For simplicity, however, the following examples use the module function as a hashing algorithm.
Here is an example of a hashing function:
Main Page = Key MOD Main_Pages

Ross, Age 50 50 mod 10 = 0; hashes to page 0
McShane, Age 22 22 mod 10= 2; hashes to page 2
.
.
.
After this hashing process is completed for all 31 rows, the table looks like this:
        +----------------------------+
Page 0  |    50|Ross      | 55000.000|
        |    20|Smith     | 10000.000|
        |    30|Curan     | 30000.000|
        |    20|Sabel     | 21000.000|
        |----------------------------|
Page 1  |                            |
        |                            |
        |                            |
        |                            |
        |----------------------------|
Page 2  |    22|McShane   | 22000.000|
        |    32|Gregori   | 31000.000|
        |    42|Brodie    | 40000.000|
        |                            |
        |----------------------------|   Overflow Chain for Page 3
Page 3  |    33|Blumberg  | 32000.000|   |----------------------------|
        |    43|Clark     | 40000.000|-->|    23|Ramos     | 30000.000|
        |    23|Ming      | 22000.000|   |    53|McTigue   | 41000.000|
        |    43|Kay       | 38000.000|   |----------------------------|
        |----------------------------|
Page 4  |    24|Saxena    | 22000.000|
        |    34|Curry     | 32000.000|
        |    44|Stein     | 40000.000|
        |    64|Robinson  | 80000.000|
        |----------------------------|
Page 5  |    55|Verducci  | 55000.000|
        |    35|Huber     | 32000.000|
        |    25|Kreseski  | 24000.000|
        |                            |
        |----------------------------|
Page 6  |    26|Zimmerman | 25000.000|
        |    46|Mandic    | 43000.000|
        |    36|Stannich  | 33000.000|
        |                            |
        |----------------------------|
Page 7  |    37|Cameron   | 35000.000|
        |    47|Giller    | 46000.000|
        |    27|Green     | 26000.000|
        |                            |
        |----------------------------|
Page 8  |    38|Stover    | 35000.000|
        |    38|Sullivan  | 35000.000|
        |    28|Gordon    | 27000.000|
        |                            |
        |----------------------------|
Page 9  |    49|Aitken    | 50000.000|
        |    29|Shigio    | 28000.000|
        |                            |
        |                            |
        +----------------------------+
To retrieve the employee data about employees who are 49 years old:
Select * from employee
  Where employee.age = 49;
The table is hashed on age, and the qualification has specified an age. The hashing algorithm is used to determine the main page on which rows with ages of 49 are located:
49 mod 10 = 9
The lookup goes directly to Page 9, instead of looking through the entire table, and returns the row requested.
To find all employees who are age 53, the calculation is:
53 mod 10 = 3
These rows are found on main Page 3. However, a search through the page, looking for 53, shows it is not there. There is an overflow page, though, and searching this page finds the row. Overflow pages are associated with a particular main page. They can slow down processing time because searches are required not only on the main page but the overflow chain connected to the main page as well.
Inserts, updates, and deletes work the same way retrievals do. If you want to append a new row, the row is placed on the page the new employee’s age hashes to. Therefore, if you add an employee with age 22, 22 mod 10 is 2, so this employee is placed on main Page 2.
To find all employees who are older than 50, there is no way of directly locating these rows using the hashing algorithm; you must hash on every possible value greater than 50. Instead, the table is treated as a heap table and every page is scanned, starting from Page 0 through to Page 9 including all overflow pages, looking for qualifying rows.
To retrieve the row where the employee’s name is Shigio, does the age key help? Because the table is not hashed on name, and Shigio’s age is unknown, the entire table must be scanned, from Page 0 through Page 9, looking for rows where the employee name is Shigio. This retrieval treats the table like a heap table, scanning every main and overflow page. To retrieve a row you need without scanning the entire table, specify the value of the key for the row.