Performance Tips
Following these recommendations will help to increase Vector performance:
• Always use optimizedb.
Without optimizedb, the SQL optimizer may order joins incorrectly.
• Use a large buffer pool.
If possible, set
bufferpool_size (see
bufferpool_size) to fit in memory the data you need for most queries. A large buffer pool is not as important if you have a high performance disk subsystem.
• Do not use compression on tables residing in memory.
If your data fits in main memory uncompressed, and you want to maximize performance, you should consider disabling automatic compression, because decompression causes some delays. (Use set trace point qe82 before creating tables.)
• Prefer NOT NULL fields over nullable fields for lower disk consumption and to achieve faster processing.
Processing nulls always introduces overhead. Make sure you define a column as NOT NULL when it always contains a value.
• Prefer numerical over character data types for higher data compression ratios.
Processing string data is more expensive than numerical data (integer, floating point, decimal, datetime). In cases where a limited number of strings are used as codes, consider using integer codes (or, if possible, single character strings) for these.
• Prefer DECIMAL and INTEGER data types over FLOAT data types for higher data compression ratios.
• Try using indexes.
Note: Vector supports clustered indexes (a table organized as an index) and secondary indexes. For a clustered index, specify columns that are used often in filters (for example, a date column in a typical data warehouse). Vector in Hadoop does not support secondary indexes.
• When designing a database schema for large and growing data warehouse tables, you should carefully take into account the query workload and update workload:
– Indexed tables may speed up query access, but slow down bulk load and updates.
– Batch updates should be avoided in extremely intensive (“firehose”) update workloads. High volume and high velocity data should be appended to the database.
– When working with thousands of small tables or having frequent small appends, reduce the [cbm]
group_size parameter (see
group_size) for lower disk consumption.
• Take advantage of parallel query execution.
Parallel query execution (using multiple cores to execute a single query) works in most scenarios. For example, it improves the performance of the query described in Run a Reporting Query in the
User Guide. Parallel query execution can be enabled by setting the [engine] max_parallelism_level parameter in the configuration file. See also
Per-query Parallelism Level.
Note: The terms parallel query execution or parallelism in this guide mean using multiple cores to execute one query (as compared to running multiple queries at the same time, which is always possible).
• Consider enabling data compression over the network when connecting through JDBC or Ingres Net. Performance can improve when compression is enabled on remote connections for large result sets. Do not use compression for small data transfers (less than 256 bytes) or when the client application, Data Access Server (DAS), and DBMS Server are on the same machine.
Compression is enabled with a connection attribute "compress=on". For JDBC, add the attribute to the URL after the database name: jdbc:ingres://host:VW7/db;compress=on. For Net, add the attribute to the connection target string (if using dynamic vnode): sql "@host,VW;compress=on[usr,pwd]::db". Or you can add the attribute to the vnode definition in netutil. For more information, see the Connectivity Guide.
• Limit the creation of min-max indexes. By default, Vector creates min-max indexes on all columns. You can limit this to specific columns by using WITH [NO]MINMAX on CREATE TABLE.
Caution! Limiting min-max indexing with NOMINMAX can severely impact performance. Use NOMINMAX only if necessary to alleviate memory usage problems with very wide tables (many columns), and then use it only for columns not involved in restrictions or joining, if possible.
Last modified date: 06/28/2024