iivwprof View
The iivwprof view contains the main information about the execution stages of recently executed queries. It contains the following attributes:
query_id
Internal query ID for a given query.
Note: Multiple records from the same query presented in the iivwprof table will have the same query_id.
op_id
Operator ID within a given query. Operators are individual processing stages of a query.
name
The name of a given operator. The indentation in the name is used to present the parent/child relationship.
ticks
The time spent in a given operator, in CPU clocks. Can be used to identify the most time-consuming operators.
tuples
The number of tuples produced by a given operator. Can be used to detect "exploding" data plans, as in the case of an unintended Cartesian product.
himem
The peak memory usage of a given operator in bytes. Can be used to identify the most memory-consuming operators.
data_blocks
The number of data blocks a given operator consumed. Data blocks are discussed in [cbm] block_size parameter (see block_size) and I/O Settings (see I/O Settings).
Note: This information is provided for MScan (table scanning) operators only, not for disk-spilling operators.
async_io
sync_io
Interpreting I/O Information in IIVWPROF View
The data_blocks, async_io, and sync_io attributes in the iivwprof view are related and correspond to how I/O is performed in Vector.
Data is divided into I/O blocks (as configured on the [cbm] block_size parameter (see block_size). Whenever the query processing layer needs to access a given data block, a data_blocks counter is increased. If this block is not in memory, this causes a synchronous (blocking) load request, increasing the sync_io counter. At the same time, if the query knows it will require additional blocks in the future, an asynchronous (non-blocking) load (sometimes called look-ahead load or prefetching) for some future blocks is issued (this allows overlapping processing and I/O). If these blocks are not in memory, the async_io parameter is increased. Note that if the "synchronous" request for the same block is issued before the "asynchronous" look-ahead request is finalized, the sync_io counter for that block will also be increased.
This logic translates to the following "rules of thumb:"
• If async_io is zero (or low), the entire (or most) of the processed data is in memory and no I/O was performed. This is often a desired situation from the performance perspective.
• If async_io is high, close to data_blocks, the buffer pool might not be able to keep the used data in memory, causing extra I/O. Increasing [cbm] bufferpool_size (see bufferpool_size) may help.
• If sync_io is low, the "asynchronous" look-ahead loads are fulfilled fast, before particular data blocks are actually needed for processing. This means the query is CPU-bound.
• If sync_io is high, the I/O subsystem is not fast enough to fulfill the "asynchronous" look-ahead requests, making the query I/O-bound. This can often be addressed by using a storage system providing a higher I/O bandwidth.