Indexing
Vector does not rely on explicitly created indexes to achieve good performance. In fact, most reporting and data analysis workloads do not benefit from explicitly indexed tables so, for these scenarios, you should not create indexes.
Some workloads do benefit from indexed data. For example, highly selective filter queries on large tables or single row update/delete statements may benefit from an index.
Vector supports two types of indexes:
• Clustered index: The CREATE INDEX statement by default creates a clustered index on a table forcing the sort order of the data in the table. Only one clustered index can be created per table. You may know this type of index as an index-organized table. Thanks to the min-max indexes that are always created on every column, a clustered index will likely result in faster data scans for queries that filter on the indexed column(s).
Clustered indexes also affect query execution plans, causing the selection of a sort merge join over a hash join when both tables in the join are indexed on the join key. The sort merge join uses less memory than the hash join and is more efficient because data is already sorted.
Creating an index on a table will introduce considerations on update operations and can increase the memory utilization to load and update data. For details, refer to
Data Loading.
• Secondary index: Use the CREATE INDEX statement WITH STRUCTURE=VECTORWISE_SI to create one or more secondary indexes on a column or set of columns. Secondary indexes are independent of clustered indexes and may be used to speed up filter queries on any table column. A secondary index will impact the data load on the table because the index must be updated when data is added or deleted or when column data is modified.
Note: At present secondary indexes are stored in-memory only, and upon restart of the database it may take some time to rebuild the secondary indexes. Also, every unique entry in the secondary index is stored in uncompressed format and there is an additional overhead of 21 bytes for the unique entry plus 8 bytes for every subsequent repeating entry. Secondary indexes are currently restricted to tables up to 4 billion rows.