Deployment Guide : 3. Database Configuration : Access to Data
 
Share this page                  
Access to Data
When data is accessed and is not in CBM memory, the data is read compressed from the disk into memory (RAM) before it is decompressed and processed using the CPU cache as the processing memory. Vector performs smart buffer management and asynchronous pre-fetching to maximize throughput in these scenarios.
Large Block I/O
Vector always performs large block I/Os to read data from the disk. Large I/Os are most efficient if the data for a single I/O is stored together. The block_size parameter in the vectorwise.conf configuration file identifies the size of the block (page). The default value is 512 KB.
Typical data warehouse queries scan through very large volumes of data. If subsequent data blocks must be accessed, then the most efficient way to retrieve the data is if the blocks are stored together on disk. This is particularly important for spinning disks, where the cost of moving the disk head is expensive and sequential I/O delivers at least two times greater throughput than random I/O. Vector uses the parameter group_size to control the number of blocks (pages) that are stored together. The default value is 8.
The parameters block_size and group_size cannot be changed after the database has been created. If you want to change these parameters after database creation you must unload, destroy, recreate, and then reload the database.
Data Allocation
By default, Vector uses a pure column-based data store for most situations, so data is allocated per column. In a pure column-based data store the minimum size that a table will occupy on disk, irrespective of the amount of data in the table, is:
number of columns * block_size * group_size
Take this into consideration if your database has many table columns, due to the number of tables and/or the number of columns per table.
You can reduce the minimum allocation by reducing the parameters group_size and/or block_size, at the cost of slower performance for large table scans.
Vector also supports a storage allocation mechanism that stores entire rows of data in a single data block. Within the block, data is still stored column-by-column to optimize data compression. Consider using row-based storage for extremely wide tables with relatively few rows to limit storage allocation, or for tables with relatively few columns, for which queries always retrieve most of the columns. The row-based storage approach is used if you use WITH STRUCTURE = VECTORWISE_ROW at the end of a CREATE TABLE statement.
Vector always performs I/O on a block by block basis and blocks are mirrored in the in-memory column buffer. Suboptimal I/O and memory buffer utilization occurs if the block is not full (for example, there is not enough data to fill a block after compression) or if data is retrieved that is not required to satisfy the query (for example, when using the row-based storage mechanism but not all columns are required in the query). Choose your storage allocation to optimize I/O for maximum performance.
Column Buffer Manager (CBM) Memory
The Column Buffer Manager (CBM) memory is a portion of the server’s memory (RAM) that is used to store data that was pre-fetched from disk. A large CBM memory size can improve query performance because only data not yet in CBM memory will have to be read from disk to satisfy the query. Vector performs smart buffer management and asynchronous pre-fetching to optimize throughput if not all data fits in memory.
Use the vectorwise.conf initialization parameter bufferpool_size to indicate the size of CBM memory. By default, 25% of the server’s memory will be allocated for CBM memory (per database).
Data in CBM memory mirrors data on disk--that is, if the data resides compressed on disk then it will be compressed in CBM memory to make optimum use of the memory buffer. Data will only be decompressed when it is taken out of CBM memory to be processed. If you want to avoid the small cost of decompression or you are dealing with very small tables with very few blocks per column then you can choose to store data uncompressed by using the command set trace point qe82 before you create a table.
Note:  An uncompressed table may take up more disk space and hence may take longer to read from disk, so you should only disable compression for small or medium size tables.