DML and Memory Utilization
Any DML that is not a data append will be stored per table in an optimized in-memory structure called the Positional Delta Tree (PDT). This includes all DML executed against a table with a clustered index that already has data in it, except for MODIFY...TO COMBINE, which is never stored in PDTs.
Vector supports multi-version read consistency. Committed data in PDTs is efficiently merged with data read from disk to answer queries. Any data that is committed to PDTs is also written to the transaction log and persisted on disk for recovery purposes before the commit completes.
In Vector a session’s transaction starts with the first operation, even if that operation is a query, and read consistency is maintained throughout the transaction. If one session is performing read-only queries, and another session is performing DML then for the read-only session to see the changes made by the DML session:
• First commit the changes in the DML session (which completes the transaction).
• End the ongoing transaction in the read-only session by issuing a commit or rollback.
Maintaining read-consistency for a read-only session can use a lot of memory if other sessions perform DML and complete transactions. For this reason on a system that performs lots of DML it is important to commit or roll back a session periodically, or to terminate a session and restart it. Also consider the use of the AUTOCOMMIT setting to automatically commit each statement. For more information, see the Vector SQL Language Guide.
Every Vector database reserves a portion of the server’s memory to store PDTs. The amount of memory reserved is controlled by the parameter max_global_update_memory as target maximum amount of query execution memory that can be used for PDTs. The default is 0.25.
When the amount of memory used by PDTs exceeds the maximum target amount then Vector triggers automatic data propagation against all tables with outstanding transactions. Transactions will continue to be executed during the process but more memory may be consumed to hold updates than the target amount configured in the vectorwise.conf file.
Best practices for incremental data loads into Vector:
• Append data whenever possible.
• Use the MODIFY...TO COMBINE statement to delete or update large portions of data. As a guideline use MODIFY...TO COMBINE if 10% or more of the data in a table changes.
• Make sure as DML is applied to commit or roll back read-only sessions regularly to (1) see committed updated data, and (2) free up memory and data blocks required to maintain a relatively old transaction consistency view.
• Consider using an explicit call to the MODIFY...TO COMBINE statement against a table with outstanding DML at a time when the system has surplus resources available to perform the operation and free up memory in PDTs.