Best Practices for Updates
The most efficient way to load data into Vector is to use bulk append:
• vwload utility or COPY VWLOAD
• COPY INTO
• Batch INSERT through an ODBC, JDBC, or .NET based application
• Spark SQL through the Spark-Vector Connector
Bulk append into a non-empty table is only available for tables in RAW format (that is, without a clustered index). The performance cost of this method is roughly proportional to the volume of data appended.
When planning an append strategy, consider the granularity of appends. To maximize efficiency, each append should use multiple disk blocks. Smaller appends work fine and Vector will first fill up blocks at the end of the table, but larger appends will further optimize on-disk storage.
For small-cardinality data modifications, you can use the standard INSERT/DELETE/UPDATE commands, which work on all table types.
For large-cardinality deletions and updates, and for appends to a table with a clustered index, you should use:
• The explicit
MODIFY...TO COMBINE method (see
Combining Tables), where you first create staging tables that contain deletions and updates, and then merge these into your tables in bulk fashion
• Batch INSERT through an ODBC, JDBC, or .NET based application
The performance cost of the MODIFY...TO COMBINE method can be significant because it is roughly proportional to the total volume of data in the table, so it should be used only when modifying a significant percentage of a table. The benefit of this approach is that it allows large modifications, results in lower memory consumption, and provides higher processing performance after the update.
Last modified date: 11/09/2022