Data Allocation

By default Vector uses a pure column-based data store for most situations so data is allocated per column. In a pure column-based data store the minimum size that a table will occupy on disk, irrespective of the amount of data in the table, is:

Take this into consideration if your database has a large number of table columns, due to the number of tables and/or the number of columns per table.

You can reduce the minimum allocation by reducing the parameters group_size and/or block_size, at the cost of slower performance for large table scans.

Vector also supports a storage allocation mechanism that stores entire rows of data in a single data block. Within the block, data is still stored column-by-column in order to optimize data compression. Consider using row-based storage for extremely wide tables with relatively few rows in order to limit storage allocation, or for tables with relatively few columns, for which queries always retrieve the majority of the columns. The row-based storage approach is used if you use WITH STRUCTURE = VECTORWISE_ROW at the end of a CREATE TABLE statement.

Vector always performs I/O on a block by block basis and blocks are mirrored in the in-memory column buffer. Suboptimal I/O and memory buffer utilization occurs if the block is not full (for example, there is not enough data to fill a block after compression) or if data is retrieved that is not required to satisfy the query (for example, when using the row-based storage mechanism but not all columns are required in the query). Choose your storage allocation to optimize I/O for maximum performance.