Histogram (Optimization Output)

14. Using the Query Optimizer : Database Statistics : Histogram (Optimization Output)

Share this page

When you optimize a database, output is generated to show the statistics.

For example, if the Print Histogram option was enabled when optimizing the database, and you chose to optimize the name and sex columns of the emp table, the following output is typical:

*** statistics for database demodb version: 00850

*** table emp1 rows:1536 pages:50 overflow pages:49

*** column name of type varchar (length:30, scale:0, nullable)

date:2000_02_24 15:40:38 GMT unique values:16.000

repetition factor:96.000 unique flag:N complete flag:0

domain:0 histogram cells:32 null count:0.0000 value length:8

cell: 0 count:0.0000 repf:0.0000 value:Abbot \037

cell: 1 count:0.0625 repf:96.0000 value:Abbot

cell: 2 count:0.0000 repf:0.0000 value:Beirne \037

cell: 3 count:0.0625 repf:96.0000 value:Beirne

cell: 4 count:0.0000 repf:0.0000 value:Buchanam

cell: 5 count:0.0625 repf:96.0000 value:Buchanan

cell: 6 count:0.0000 repf:0.0000 value:Cooper \037

cell: 7 count:0.0625 repf:96.0000 value:Cooper

cell: 8 count:0.0000 repf:0.0000 value:Dunham \037

cell: 9 count:0.0625 repf:96.0000 value:Dunham

cell: 10 count:0.0000 repf:0.0000 value:Ganley \037

cell: 11 count:0.0625 repf:96.0000 value:Ganley

cell: 12 count:0.0000 repf:0.0000 value:Hegner \037

cell: 13 count:0.0625 repf:96.0000 value:Hegner

cell: 14 count:0.0000 repf:0.0000 value:Jackson\037

cell: 15 count:0.0625 repf:96.0000 value:Jackson

cell: 16 count:0.0000 repf:0.0000 value:Klietz \037

cell: 17 count:0.0625 repf:96.0000 value:Klietz

cell: 18 count:0.0000 repf:0.0000 value:Myers \037

cell: 19 count:0.0625 repf:96.0000 value:Myers

cell: 20 count:0.0000 repf:0.0000 value:Petersom

cell: 21 count:0.0625 repf:96.0000 value:Peterson

cell: 22 count:0.0000 repf:0.0000 value:Rumpel \037

cell: 23 count:0.0625 repf:96.0000 value:Rumpel

cell: 24 count:0.0000 repf:0.0000 value:Singer \037

cell: 25 count:0.0625 repf:96.0000 value:Singer

cell: 26 count:0.0000 repf:0.0000 value:Stec \037

cell: 27 count:0.0625 repf:96.0000 value:Stec

cell: 28 count:0.0000 repf:0.0000 value:Washings

cell: 29 count:0.0625 repf:96.0000 value:Washingt

cell: 30 count:0.0000 repf:0.0000 value:Zywicki\037

cell: 31 count:0.0625 repf:96.0000 value:Zywicki

unique chars: 14 9 11 11 9 11 6 3

char set densities: 0.5200 0.3333 0.4762 0.6667 0.0952 0.1111 0.0633 0.0238

*** statistics for database demodb version: 00850

*** table emp rows:1536 pages:50 overflow pages:49

*** column sex of type char (length:1, scale:0, nullable)

date:23-feb-2000 10:12:00 unique values:2.000

repetition factor:768.000 unique flag:N complete flag:0

domain:0 histogram cells:4 null count:0.0000000 value length:1

cell: 0 count:0.0000000 repf:0.0000000 value:E

cell: 1 count:0.0006510 repf:1.0000000 value:F

cell: 2 count:0.0000000 repf:0.0000000 value:L

cell: 3 count:0.9993489 repf:1535.0000000 value:M

unique chars: 2

char set densities: 0.1428571

The items in the histogram are as follows:

database

Database name

version

Version of the catalog from which statistics were derived. Shown only if version is 00605 or later.

table

Table currently processing

rows

Current number of rows in table as stored in the iitables catalog

pages

Number of pages (from the iitables catalog)

overflow pages

Number of overflow pages (from the iitables catalog)

column

Column currently processing

type

Column data type. The length, scale, and nullable indicators are obtained from the iicolumns catalog.

date

Time and date when statistics were created

unique values

Number of unique values found in the table

repetition factor

Average number of rows per unique value. The repetition factor times the number of unique values must produce the row count.

unique flag

“Y” if unique or nearly unique, “N” if not unique

complete flag

All possible values for the column exist in this table. When this column is used in a join predicate with some other column, it tells the query optimizer that every value in the other column must be a value of this column as well. This knowledge enables the query optimizer to build more accurate query plans for the join.

domain

Not used

histogram cells

Number of histogram cells used (0 to 500 maximum)

null count

Proportion of column values that are NULL, expressed as a real number between 0.0 and 1.0

value length

Length of cell values

cell

For each cell, a cell number, count (proportion of rows whose values fall into this cell: between 0.0 and 1.0), average number of duplicates per unique value in the cell, and the upper bound value for the cell

unique chars

Number of unique characters per character position. Shown only for character columns.

char set densities

Relative density of the character set for each character position. Shown only for character columns.

The number of unique values the column has is calculated. The count listed for each cell is the fraction of all the values falling between the lower and upper boundaries of the cell. Statistics for the sex column show that there are no rows with values less than or equal to ‘E,’ 0.06510% of rows with values equal to ‘F,’ no rows with values in the ‘G’ to ‘L’ range, and 99.93% of the rows with values equal to ‘M.’ The cell count includes those rows whose column values are greater than the lower cell bound but less than or equal to the upper cell bound. All cell counts must add to 1.0, representing 100% of the table rows.

Looking at the cells for the name column, you see that between the lower bound cell 0, “Abbot \037”, and cell 1, “Abbot”, 6.25% of the employee’s names are located:

cell: 0 count:0.0000000 repf:0.0000000 value:Abbot \037

cell: 1 count:0.0625000 repf:96.000000 value:Abbot

A restriction such as the following brings back about 6.25% of the rows in the table:

where emp.name = 'Abbot'

The character cell value \037 at the end of the string is octal for the ASCII character that is one less than the blank. Therefore, cell 0 in the name example represents the value immediately preceding ‘Abbot’ in cell 1. This indicates that the count for cell 1 includes all rows whose name column is exactly ‘Abbot.’

In addition to the count and value, each cell of a histogram also contains a repetition factor (labeled “repf” in the statistics output). This is the average number of rows per unique value for each cell, or the “per-cell” repetition factor. The query optimizer uses these values to compute more accurate estimates of the number of rows that result from a join. This is distinct from the repetition factor for the whole column displayed in the header portion of the statistics output.