11. Updating Data

Data can be inserted, updated, and deleted using these methods:

Updating data in Vector tables can be done as either a batch or bulk operation.

Batch operations go to the PDTs (memory) by default. Bulk operations are written to the table files on disk by default.

Bulk operations have higher performance and lower memory footprint (data goes directly to disk), but concurrent transactions with bulk operations on the same table are not allowed. The later transaction to commit will conflict and need to be rolled back. When concurrency is needed, you can use SET INSERTMODE ROW to make the operations that by default execute in bulk mode execute in batch mode. See Allowing Concurrent Inserts.

By default, SQL statements and the vwload command map to batch and bulk DML operations as follows:

In scenarios where both batch and bulk updates are valid, you can choose the method of operation by specifying:

COPY FROM, INSERT...SELECT, and vwload are batch operations if executed on a non-empty table with a clustered index. For these operations to be bulk, the target table with the clustered index must meet all these conditions:

The following table summarizes the various update methods and their behavior depending on the target table:

Operation	Raw Table	Table with Clustered Index when Empty	Table with Clustered Index when Not Empty
COPY FROM	Bulk by default; Batch with INSERTMODE ROW; Bulk with INSERTMODE BULK	Bulk by default; Batch with INSERTMODE ROW; Bulk with INSERTMODE BULK	Batch
CREATE TABLE AS SELECT	Bulk by default; Batch with INSERTMODE ROW; Bulk with INSERTMODE BULK	Not applicable	Not applicable
DELETE	Batch	Batch	Batch
INSERT	Batch by default; Batch with INSERTMODE ROW; Bulk with INSERTMODE BULK	Batch by default; Batch with INSERTMODE ROW; Bulk with INSERTMODE BULK	Batch
INSERT...SELECT	Bulk by default; Batch with INSERTMODE ROW; Bulk with INSERTMODE BULK	Bulk by default; Batch with INSERTMODE ROW; Bulk with INSERTMODE BULK	Batch
Batch INSERT through an ODBC, JDBC, or .NET interface	Bulk by default; Batch with INSERTMODE ROW; Bulk with INSERTMODE BULK	Bulk by default; Batch with INSERTMODE ROW; Bulk with INSERTMODE BULK	Batch
MERGE...WHEN NOT MATCHED INSERT	Bulk by default; Batch with INSERTMODE ROW; Bulk with INSERTMODE BULK	Bulk by default; Batch with INSERTMODE ROW; Bulk with INSERTMODE BULK	Batch
Spark SQL through the Spark-Vector Connector	Bulk by default; Batch with INSERTMODE ROW; Bulk with INSERTMODE BULK	Bulk by default; Batch with INSERTMODE ROW; Bulk with INSERTMODE BULK	Batch
UPDATE	Batch	Batch	Batch
vwload	Bulk (Batch with ‑R)	Bulk (Batch with ‑R)	Batch

An alternative method of updating data is combining tables (see Combining Tables).

Data can be transferred in either direction between traditional Ingres tables and Vector tables using either INSERT...SELECT or CREATE TABLE AS SELECT.

Use a statement like the following:

If the default setting is to create a Vector table, use a statement like the following:

Because DML operations can be costly in terms of memory resources if the batch update mode is used, an alternative way to apply data updates is to use the MODIFY...TO COMBINE statement. This process merges the updates buffered in memory, and at the same time provides a way for performing bulk DML operations on any form of a table.

Examples:

Notes:

You can use similar solutions for other DML operations.

For details on this command, see MODIFY...TO COMBINE Statement--Merge and Update Data.

During batch operations Vector automatically propagates the changes buffered in memory to the disk-resident table. Because such propagation can be costly in terms of time and resources, it is best to avoid frequent propagation to large tables by using one of the following approaches:

For large updates, we recommend using a bulk insert (for example, COPY or vwload) to initially load the data into staging tables, and then using explicit MODIFY...TO COMBINE statements.

The syntax of the MODIFY...TO COMBINE statement is as follows:

This statement tells the system that all tuples from except tables must be deleted from the base table, and then all tuples from union tables must be added to that table. This statement generates a new copy of the base table.

For detailed usage notes on MODIFY...TO COMBINE, see the Vector SQL Language Guide.

During batch operations Vector automatically propagates the changes buffered in memory to the disk-resident table.

To do this manually, use the MODIFY...TO COMBINE statement. Doing so frees this memory, which is shown as memory.update_allocated in the vwinfo output.

Use the following SQL statement:

The most efficient way to load data into Vector is to use bulk append:

Bulk append into a non-empty table is only available for tables in RAW format (that is, without a clustered index). The performance cost of this method is roughly proportional to the volume of data appended.

When planning an append strategy, consider the granularity of appends (see Granularity of DML Operations on page 20). To maximize efficiency, each append should use multiple disk blocks. Smaller appends work fine and Vector will first fill up blocks at the end of the table, but larger appends will further optimize on-disk storage.

For small-cardinality data modifications, you can use the standard INSERT/DELETE/UPDATE commands, which work on all table types.

For large-cardinality deletions and updates, and for appends to a table with a clustered index, you should use:

The performance cost of the MODIFY...TO COMBINE method can be significant because it is roughly proportional to the total volume of data in the table, so it should be used only when modifying a significant percentage of a table. The benefit of this approach is that it allows large modifications, results in lower memory consumption, and provides higher processing performance after the update.

Data is “inserted” into a Vector table by either an insert or an append. An insert goes to the PDTs, which reside in memory. An append is written directly to the table files on disk. (PDTs eventually are written to disk through update propagation.)

The default behavior is: Single row inserts go through the PDTs (insert), INSERT AS SELECT goes directly to disk (append).

If you need to do concurrent INSERT AS SELECTs into the same table, you must use the SET INSERTMODE ROW statement. If you do not set this option, by default, a second commit will fail with the following error:

When you SET INSERTMODE to ROW in both sessions, however, the second commit will succeed.

INSERTMODE can also be set to BULK, which means data is appended directly to disk, which does not allow concurrent inserts. If you have two sessions and at least one of them has INSERTMODE set to BULK, then an error will occur.