16. Using the Query Optimizer : Database Statistics : Histogram (Optimization Output)
 
Share this page                  
Histogram (Optimization Output)
When you optimize a database, output is generated to show the statistics.
For example, if the Print Histogram option was enabled when optimizing the database, and you chose to optimize the name and sex columns of the emp table, the following output is typical:
*** statistics for database demodb version: 00850
*** table emp1 rows:1536 pages:50 overflow pages:49
*** column name of type varchar (length:30, scale:0, nullable)
date:2000_02_24 15:40:38 GMT unique values:16.000
repetition factor:96.000 unique flag:N complete flag:0
domain:0 histogram cells:32 null count:0.0000       value length:8
cell:   0    count:0.0000    repf:0.0000    value:Abbot \037
cell:   1    count:0.0625    repf:96.0000    value:Abbot
cell:   2    count:0.0000    repf:0.0000    value:Beirne \037
cell:   3    count:0.0625    repf:96.0000    value:Beirne
cell:   4    count:0.0000    repf:0.0000    value:Buchanam
cell:   5    count:0.0625    repf:96.0000    value:Buchanan
cell:   6    count:0.0000    repf:0.0000    value:Cooper \037
cell:   7    count:0.0625    repf:96.0000    value:Cooper
cell:   8    count:0.0000    repf:0.0000    value:Dunham \037
cell:   9    count:0.0625    repf:96.0000    value:Dunham
cell:  10    count:0.0000    repf:0.0000    value:Ganley \037
cell:  11    count:0.0625    repf:96.0000    value:Ganley
cell:  12    count:0.0000    repf:0.0000    value:Hegner \037
cell:  13    count:0.0625    repf:96.0000    value:Hegner
cell:  14    count:0.0000    repf:0.0000    value:Jackson\037
cell:  15    count:0.0625    repf:96.0000    value:Jackson
cell:  16    count:0.0000    repf:0.0000    value:Klietz \037
cell:  17    count:0.0625    repf:96.0000    value:Klietz
cell:  18    count:0.0000    repf:0.0000    value:Myers \037
cell:  19    count:0.0625    repf:96.0000    value:Myers
cell:  20    count:0.0000    repf:0.0000    value:Petersom
cell:  21    count:0.0625    repf:96.0000    value:Peterson
cell:  22    count:0.0000    repf:0.0000    value:Rumpel \037
cell:  23    count:0.0625    repf:96.0000    value:Rumpel
cell:  24    count:0.0000    repf:0.0000    value:Singer \037
cell:  25    count:0.0625    repf:96.0000    value:Singer
cell:  26    count:0.0000    repf:0.0000    value:Stec \037
cell:  27    count:0.0625    repf:96.0000    value:Stec
cell:  28    count:0.0000    repf:0.0000    value:Washings
cell:  29    count:0.0625    repf:96.0000    value:Washingt
cell:  30    count:0.0000    repf:0.0000    value:Zywicki\037
cell:  31    count:0.0625    repf:96.0000    value:Zywicki
unique chars: 14 9 11 11 9 11 6 3
char set densities: 0.5200 0.3333 0.4762 0.6667 0.0952 0.1111 0.0633 0.0238
*** statistics for database demodb version: 00850
*** table emp rows:1536 pages:50 overflow pages:49
*** column sex of type char (length:1, scale:0, nullable)
date:23-feb-2000 10:12:00     unique values:2.000
repetition factor:768.000 unique flag:N complete flag:0
domain:0 histogram cells:4 null count:0.0000000 value length:1
cell:   0    count:0.0000000          repf:0.0000000          value:E
cell:   1    count:0.0006510          repf:1.0000000          value:F
cell:   2    count:0.0000000          repf:0.0000000          value:L
cell:   3    count:0.9993489          repf:1535.0000000       value:M
unique chars: 2
char set densities: 0.1428571
The items in the histogram are as follows:
database
Database name
version
Version of the catalog from which statistics were derived. Shown only if version is 00605 or later.
table
Table currently processing
rows
Current number of rows in table as stored in the iitables catalog
pages
Number of pages (from the iitables catalog)
overflow pages
Number of overflow pages (from the iitables catalog)
column
Column currently processing
type
Column data type. The length, scale, and nullable indicators are obtained from the iicolumns catalog.
date
Time and date when statistics were created
unique values
Number of unique values found in the table
repetition factor
Average number of rows per unique value. The repetition factor times the number of unique values must produce the row count.
unique flag
“Y” if unique or nearly unique, “N” if not unique
complete flag
All possible values for the column exist in this table. When this column is used in a join predicate with some other column, it tells the query optimizer that every value in the other column must be a value of this column as well. This knowledge enables the query optimizer to build more accurate query plans for the join.
domain
Not used
histogram cells
Number of histogram cells used (0 to 500 maximum)
null count
Proportion of column values that are NULL, expressed as a real number between 0.0 and 1.0
value length
Length of cell values
cell
For each cell, a cell number, count (proportion of rows whose values fall into this cell: between 0.0 and 1.0), average number of duplicates per unique value in the cell, and the upper bound value for the cell
unique chars
Number of unique characters per character position. Shown only for character columns.
char set densities
Relative density of the character set for each character position. Shown only for character columns.
The number of unique values the column has is calculated. The count listed for each cell is the fraction of all the values falling between the lower and upper boundaries of the cell. Statistics for the sex column show that there are no rows with values less than or equal to ‘E,’ 0.06510% of rows with values equal to ‘F,’ no rows with values in the ‘G’ to ‘L’ range, and 99.93% of the rows with values equal to ‘M.’ The cell count includes those rows whose column values are greater than the lower cell bound but less than or equal to the upper cell bound. All cell counts must add to 1.0, representing 100% of the table rows.
Looking at the cells for the name column, you see that between the lower bound cell 0, “Abbot \037”, and cell 1, “Abbot”, 6.25% of the employee’s names are located:
cell:   0   count:0.0000000   repf:0.0000000  value:Abbot   \037
cell:   1   count:0.0625000   repf:96.000000  value:Abbot
A restriction such as the following brings back about 6.25% of the rows in the table:
where emp.name = 'Abbot'
The character cell value \037 at the end of the string is octal for the ASCII character that is one less than the blank. Therefore, cell 0 in the name example represents the value immediately preceding ‘Abbot’ in cell 1. This indicates that the count for cell 1 includes all rows whose name column is exactly ‘Abbot.’
In addition to the count and value, each cell of a histogram also contains a repetition factor (labeled “repf” in the statistics output). This is the average number of rows per unique value for each cell, or the “per-cell” repetition factor. The query optimizer uses these values to compute more accurate estimates of the number of rows that result from a join. This is distinct from the repetition factor for the whole column displayed in the header portion of the statistics output.