X100 Concepts
X100 Table Structures
In broad terms, X100 uses columnar storage. While data is stored and retrieved in familiar relational rows, the internal storage is different. Instead of storing all column values for a single row next to each other, all rows for a single column are stored together. This storage format has benefits for data warehouse applications, such as reduced I/O and improved efficiency.
A relational table may have dozens or hundreds of columns. In traditional row-oriented relational storage, the entire row must be read even if a query requests only a few columns. With column-oriented storage, a query that needs only a few columns will not read the remaining columns from disk and will not waste memory storing unnecessary values. A side benefit of this columnar storage is that
compression (see Data Compression) can be more effective.
While X100 is fundamentally a column store, it is not strictly so. For certain types of tables, X100 stores data from more than one column together in a data block on disk. Within a data block the values for each column are kept together so that data for a column can be processed efficiently in vectors. X100 also provides a storage type that lets you indicate you want to store all columns for a table in a single block.
Data Storage Format
A database consists of multiple files and can reside in multiple locations. Tables can also be spread across multiple locations.
Updates, inserts, and deletes to the data and the layout of the data in the data files are stored as log information in VWROOT/ingres/data/vectorwise/dbname/wal. The data files and the wal directory must be considered together as files that represent the database.
The data files consist of blocks, which can be seen as the equivalent of pages. Each block contains possibly compressed data from one or more attributes. The size of the block can be configured with the [cbm] block_size parameter before creating a database.
Also, for better sequential access performance, multiple blocks from the same column can be stored contiguously in a block group. The size of this group can be configured with the [cbm] group_size parameter.
Database size is unlimited because data can be spread across multiple disks. If a table or a column is dropped, its corresponding file or files are deleted and the disk space is reclaimed by the operating system.
Note: Due to concurrent transactions, the system can internally still contain references to a table or a column even when it is dropped, so the effect of dropping a table or column may not be seen immediately but only after all concurrent transactions have ended.
Raw Table Storage Format
The default storage format for an X100 table (X100) is also known as a RAW table. This format stores the data on disk in columns in the order in which it is inserted and the data is compressed.
In addition, for each column the system automatically maintains simple statistics about the data distribution by storing the minimum and maximum value for ranges of tuples. If a query includes a filter on the column data value, the execution engine uses these min-max data values to decide whether to examine the data in a given block. Since min-max information is maintained in memory outside of the data block, this simple form of automatic indexing can dramatically reduce disk I/O and memory usage. You can configure the granularity of the min-max index with the [cbm] minmax_maxsize parameter.
Data Compression
Columnar storage inherently makes compression (and decompression) more efficient than does row-oriented storage.
For row-oriented data, choosing a compression method that works well for the variety of data types in a row can be challenging, because compression for text and numeric data work best with different algorithms.
Column storage allows the algorithm to be chosen according to the data type and the data domain and range, even where the domain and range are not declared explicitly. For example, an alphabetic column GENDER defined as CHAR(1) will have only two actual values (M and F), and rather than storing an eight-bit byte, the value can be compressed to a single bit, and then the bit string can be further compressed.
X100 uses different types of algorithms from those found in most other products. Because X100 processes data so efficiently, compression—and in particular decompression—are designed to use little CPU and to reduce disk I/O. While on-disk compression ratios may be slightly lower than other products, overall performance is improved.
Compression in X100 is automatic, requiring no user intervention. X100 chooses a compression method for each column, per data block, according to its data type and distribution.
Data Type Storage Format and Compression Type
Data types are stored internally in a specific format. The type of compression used depends on the data type and distribution.
X100 can use any of the following compression methods:
• RLE (Run Length Encoding) – This method is efficient if many duplicate adjacent tuple values are present (such as in ordered columns with few unique values).
• PFOR (Patched Frame Of Reference) – This method encodes values as a small difference from a page-wide base value. The term "Patched" indicates that FOR is enhanced with a highly efficient way to handle values that fall outside the common frame of reference. PFOR is effective on any data distribution with some value distribution locality.
• PFOR-DELTA (delta encoding on top of PFOR) – In this method, the integers are made smaller by considering the differences between subsequent values. PFOR-DELTA is highly effective on ordered data.
• PDICT dictionary encoding (offsets into a dictionary of unique values) – This method is efficient if the value distribution is dominated by a limited amount of frequent values.
• LZ4 – This algorithm detects and encodes common fragments of different string values. It is particularly efficient for medium and long strings.
Most INTEGER, DECIMAL, and DATE and TIME types internally are compressed using any of the first four compression methods.
FLOAT and FLOAT4 types are stored without compression in X100 tables.
Character types (CHAR, VARCHAR, NCHAR, NVARCHAR) of lengths larger than one are stored internally as variable width strings. This data can be automatically compressed using either a per-block dictionary or LZ4 algorithm.
NULL values are stored internally as a single byte column and are compressed using the RLE method. The null indicator, if needed, is represented internally as a separate column. Loading and processing of nullable columns can be slower than non-nullable columns.
Query Optimization
Analytical queries are the focus of X100. It is essential that such queries have good information about the characteristics of the underlying data because the amount of data accessed can be great, and the sizes of intermediate processing stages must be properly estimated to produce optimal query plans.
To allow the system to generate optimal query plans, we highly recommend running optimizedb if 10% or more of the data in a table has changed since the last optimization. X100 will automatically construct histograms from live data for those columns that do not already have histograms stored in the catalog.
Data Manipulation
Despite its focus on analytical rather than transactional processing, X100 provides a rich set of data updating capabilities.
Granularity of DML Operations
X100 distinguishes between two granularities of DML operations:
• Batch – Operations that involve a relatively small number (for example, thousands or tens of thousands) of records. Such updates are buffered in the system memory and merged with the data from disk on the fly.
INSERT, DELETE, and UPDATE are always executed as batch operations. COPY FROM and INSERT...SELECT are executed as batch operations if the destination table has a clustered index declared and is not empty.
• Bulk – Operations that involve many records (for example, one million or more), in which the data changes do not necessarily fit in the system memory.
CREATE TABLE AS SELECT is always executed as a bulk operation. Data load through vwload, COPY FROM and INSERT...SELECT are executed as bulk operations if the destination table has no clustered index declared or if it is empty. A data load using DataFlow Direct Load is a special case of a high-speed bulk operation.
Snapshot Isolation
X100 provides update capabilities based on the concept of snapshot isolation. This means that the state of the system the user sees when starting a query stays the same even under concurrent updates. For details, see
Transaction Isolation Model.
Persistence and Recovery
A COMMIT statement makes all changes from a transaction become persistent (including batch operations). In addition, X100 provides full automatic recovery. If a system crashes during a transaction, any changes made by that transaction will not be visible after the system restart, but all successfully committed transactions will be restored.
Management of In-memory Updates
Batch updates are stored in a special highly-optimized differential data structure, called the Positional Delta Tree (PDT), in main memory. Data is also written to disk immediately to ensure recoverability.
Batch updates that reside in memory can potentially consume a significant percentage of useful system memory. Thus, if the user is performing batch operations, X100 automatically propagates the changes buffered in memory to the disk-resident table. Propagation is triggered by factors such as table sizes and the number of DML operations buffered.
The volume of batch updates in such a propagation cycle should fit in main memory.
Update propagation can be a relatively expensive operation if there are many modified rows (especially in case of even distribution of updates across the table). X100 tries to rewrite only the affected blocks unless it would result in excessive table fragmentation.
The best practice is to avoid propagation to large tables and use appends where possible.
The amount of memory reserved for Positional Delta Trees and the amount of differences after which propagation is triggered can be configured with system configuration parameters. In cases with high update rates, the memory required for PDTs can become too high, in which case you must work with raw tables or use bulk operations, such as the MODIFY...TO COMBINE statement.
Transaction Isolation Model
X100 uses a snapshot isolation transaction isolation model with optimistic concurrency control.
Snapshot Isolation
Snapshot isolation is a transaction isolation model in which a given transaction always sees the same, consistent state of the system representing the moment when a transaction has started. As a result, phenomena seen in certain other isolation levels (dirty, non-repeatable, and phantom reads) are avoided. Snapshot isolation is closely related to multiversion concurrency control.
Snapshot isolation is slightly weaker than the serializable isolation level available in some systems.
The main benefit of snapshot isolation is that, while providing a high transaction isolation level, it avoids locking for read operations. Such locks can be detrimental to database performance. By combining it with the optimistic concurrency control model, it allows X100 to achieve high performance of update operations without affecting the performance of read-only queries.
For more information, see:
Optimistic Concurrency Control
Unlike some systems, X100 uses the optimistic concurrency control model. In this model, transactions can perform changes to the data without checking for conflicts possibly caused by other transactions that have committed in the meantime. Conflicts are checked only when a transaction commits. This refers not only to conflicts caused by changing the same data, but also to constraint violations that occur only as a result of two transactions committing (for example, two concurrent transactions inserting a record with the same UNIQUE field).
Compared to other approaches, optimistic concurrency control avoids locking records for concurrently working transactions. This results in good performance in cases where transactions typically do not conflict with each other.
The most visible aspect of optimistic concurrency control that sometimes causes problems for users accustomed to other solutions is that the conflicts are detected only at commit time. The application logic must take this into account.
Transaction Examples
Non-dirty Read Example
This example demonstrates how committing transactions are not visible to concurrently running transactions. In this case, Transaction A will insert a value, but Transaction B will not see it until a new transaction is started.
Conflict Resolution Example
In this example, two transactions concurrently insert a record with the same value in a Primary Key attribute. This causes a conflict for the second committing transaction due to constraint violation.
Note: Concurrent updates on a table with a clustered index on foreign key columns will conflict; only one of the transactions will be allowed to commit. Others will return an error and be aborted.
Transactions and DDL Operations
Snapshot isolation and optimistic concurrency control are applied to operations on data (DML).
For operations that change the database schema (DDL), a different isolation model is used. In this model, transactions only see DDL changes performed by other transactions after they committed. Still, if there are conflicting DDL operations (for example, two transactions creating a table with the same name), the system locks the second transaction until the first transaction either commits or aborts.
X100 Storage Types
By default, X100 uses the X100 storage structure. Queries accessing X100 tables use an alternative execution module that is optimized for analytical processing. Some operations are not available when using the X100 storage type.
An X100 table can be created with one of two storage types:
X100
(Default) Stores data in columns. Puts as few columns as possible in one disk block (in most cases only one). An alias for X100 is VECTORWISE.
X100_ROW
Stores data in rows. Puts as many columns as possible in one disk block.
In one block, values are still kept in compressed vectors. Potentially, all columns in a row can be stored in a disk block. This type is useful for small tables, especially those with a large number of columns; it saves disk space and disk cache memory. An alias for X100_ROW is VECTORWISE_ROW.
The result_structure parameter in config.dat sets the default storage structure for the instance. When a table is created, the default storage structure is assumed.
To override the default, specify SET RESULT_STRUCTURE for the session or use WITH STRUCTURE=structure in the CREATE TABLE statement.
Transliteration Between UTF8 and National Character Set
X100 stores character data in UTF-8, using UCS_BASIC collation for all comparisons and ordering. Because the X100 table structure only supports UTF-8 data, Actian X transparently converts data selected from X100 tables to match the installation character set. Similarly, Actian X transparently converts data entered for inserting or updating X100 tables into the installation character set before being persisted in the X100 data store. So your installation can continue to use its existing national character set.
Because UTF8 is a multi-byte character set, slightly larger column widths may be needed depending on the mix of characters present. Actian X column specifications include a column length for character data; note that a wider column may be needed if multi-byte characters are to be stored. The size increase is not excessive and you can use the convtohyb utility to help determine the size necessary.
X100 Transaction Management
By default, autocommit is turned off, which means each transaction consists of multiple statements ended by either a COMMIT or ROLLBACK statement.
If an error occurs during the execution of an SQL statement, the default is to roll back the statement. You can change this behavior and roll back the entire transaction if an error occurs by issuing SET SESSION WITH ON_ERROR = ROLLBACK TRANSACTION.
To determine the current status of transaction error handling, issue the SELECT DBMSINFO('ON_ERROR_STATE') statement.