Configuring and Managing the X100 Engine
X100 Configuration File (vectorwise.conf)
The X100 Engine uses its own configuration file called vectorwise.conf.
A configuration file is not required. If no configuration file exists, default values are used.
Per-installation configuration file
II_DATABASE/ingres/data/vectorwise/vectorwise.conf
This file is installed by default. It contains examples of available configuration parameters.
Per-database configuration file
II_DATABASE/ingres/data/vectorwise/vectorwise.dbname.conf
or
$DATABASE_ROOT_AREA/ingres/data/vectorwise/dbname/vectorwise.conf
where dbname is the name of the database, and
$DATABASE_ROOT_AREA is the area associated with the data location specified with the createdb -v option, or -d if no -v was given, or II_DATABASE if neither was given.
No per-database configuration file is installed automatically, but a template in $II_SYSTEM/ingres/files/vectorwise.db.conf_TMPL can be used as a guide if a per-database configuration is needed.
IMPORTANT! If you want to configure settings for a single database that influence the database creation process (for example: [cbm] block_size), you must create the vectorwise.dbname.conf file with the desired settings before creating the database.
The order of precedence of the various configuration files is as follows (#1 overrides #2, which overrides #3):
1. vectorwise.conf in the vectorwise/dbname directory
2. vectorwise.dbname.conf in the vectorwise directory
3. vectorwise.conf (the installation wide configuration)
X100 Per-node Configuration File
Per-node configuration file
The following directory exists on the master node for each of the slave nodes:
$II_SYSTEM/ingres/files/datanodes/hostname
This directory contains the files that are synced to each node, including the vectorwise.conf file for each node.
X100 Configuration File Format
Vectorwise.conf is a text file that consists of sections. The section name is in brackets. Each section contains key/value pairs.
For example:
[memory]
max_memory_size=2G
min_mem_per_transaction=20M
[cbm]
bufferpool_size=1G
When setting byte values, use a suffix of K, M, G, or T, representing kilobytes, megabytes, gigabytes, and terabytes, respectively.
X100 Configuration Parameters
Each configuration parameter belongs in one of these sections: memory, system, server, cbm, or engine.
[memory] Settings
Memory settings in vectorwise.conf affect the memory used for query processing (not for data caching in the buffer pool).
max_memory_size
Specifies the amount (in bytes) of the total memory used for query execution.
Notes:
• Max_memory_size applies to all queries executing across the system, but a single query can consume almost all available memory.
• Some queries will fail if they consume too much memory; increasing available memory can help.
• If max_memory_size is not specified or is not larger than zero, a default setting of 50% of the physical system memory is used.
• Memory size defined with this option does not include bufferpool_size.
Limits: 32 MB to 256 TB
Default: 0 (use 50% of the physical system memory)
max_overalloc
Specifies the maximum amount of memory that will be overallocated (reserved as virtual memory) for zero-cost growing of certain memory areas. The setting needs to be changed only in rare cases.
A value of 0 means no overallocation. A value of ‑1 means overallocate with the amount of total system memory ("unlimited").
Note: Overallocation should be disabled on operating systems where: /proc/sys/vm/overcommit_memory is not 1, or "ulimit -v" is not unlimited.
Default: 2G or set to 0 upon installation if the system does not allow overallocation.
huge_tlb_lim
Enables the huge pages feature of the CPU only for queries using the number of cores less than or equal to the specified value. Set this option to 1 if parallel queries do not benefit from the huge pages feature. This may be experienced on some operating systems, such as Red Hat Enterprise Linux 6, which has an automatic anonymous TLB feature.
Special value 0 means there is no limit.
Default: 0 (no limit)
use_huge_tlb
Specifies whether to use the
huge pages feature (see
Using Large Pages) of the CPU. Valid values are true and false.
Default: false
use_numa
Default: true
[system] Settings
System settings in vectorwise.conf affect the X100 system.
full_rewrite_on_combine
Specifies whether all blocks are to be rewritten by the COMBINE command. Setting this option to true can be used to avoid or undo block fragmentation introduced by partial update propagation.
Default: false
full_rewrite_on_update_propagation
Specifies whether all blocks are to be rewritten by the system-triggered propagation process. Setting this option to true can be used to avoid block fragmentation introduced by partial update propagation.
Default: false
max_old_log_size
Specifies the maximum size for the main_wal_backups directory, which holds old main.wal files.
Default: 2 GB
Note: The system will exceed the default value if it is necessary to keep a single old main.wal file that exceeds 2 GB.
num_cores
Specifies the number of processing units in the system. This value is used to calculate the amount of memory available for each transaction.
For VectorH, num_cores is set based the number of cores on the master node only, but is set on each node. An equal number of cores on each node is assumed.
Num_cores is used to calculate the maximum parallelism level for a newly issued query if there are many queries already being executed. The goal is not to deteriorate the total throughput of the system. Generally, the higher the num_cores value, the higher parallelism levels are granted for queries running concurrently (where each parallelism level is not higher than max_parallelism_level).
Num_cores is used to derive a database-wide target on the number of concurrent threads used for parallel execution. Queries will run at a lower degree of parallelism if the default or requested degree would lead to an uneven allocation of resources to the new query based on the number of concurrently executing queries. On a system running many concurrent queries, new queries may get only one core and will more likely run out of memory since the num_cores value is used to reserve memory for possible other threads.
Default: Number of cores on the master node.
resolve_concurrent_updates
Tells X100 whether to attempt to merge concurrent updates on the same table. If set to false, the transaction that tries to commit last will fail and will have to abort. Disabling merging of concurrent updates can reduce memory consumption, especially if there are any long-running transactions in the system.
Default: true
use_sse42
Uses SSE4.2 for accelerated string processing on CPU architectures that support it.
Default: YES
PDT Parameters
The Positional Delta Tree (PDT) is a data structure used for batch updates residing in memory. For more information, see Management of In-memory Updates in the User Guide.
max_global_update_memory
Specifies an upper limit on total update memory consumption as a percentage of query memory limit (max_memory_size). When this limit is reached, the system propagates in-memory updates to disk.
Update memory is a component of query memory.
Limits: 0.01 to 1.0
Default: 0.25 (one quarter of max_memory_size)
max_number_propagate_retries
Specifies the maximum number of retries for failed automatic update propagation. When propagation for a certain table fails, X100 will retry propagating this table if no alternative tables are possible.
Default: 3
max_table_update_ratio
Specifies a per-table maximum percentage of in-memory updates (that is, the number of updates divided by the number of stable tuples). When this limit is reached, the system propagates to disk in-memory updates for this table.
Limits: 0.01 to 1.0
Default: 0.05
max_update_memory_per_transaction
Specifies an upper limit on the total update memory in the snapshot layer of a transaction, as a percentage of query memory limit (max_memory_size). When this limit is reached, the system will not allow further update operations in this transaction.
Limits: 0.01 to 1.0
Default: 0.25 (one quarter of max_memory_size)
min_propagate_table_count
Specifies the minimum number of tuples in a table above which max_table_update_ratio is checked. This is to avoid frequent update propagation to disk on relatively small tables.
Default: 500K
update_propagation
Enables or disables system-triggered propagation of in-memory updates. Valid values are true and false.
Note: Disabling automatic propagation overrides the max_global_update_memory setting and can lead to unwanted consequences if query memory becomes full with PDTs.
Default: true
[server] Settings
Server settings in vectorwise.conf affect the X100 server.
gather_profiles_on_master
Specifies whether the profiling data should be gathered and printed to file on the master node.
Note: Using this option can result in additional network traffic, which might lead to lower performance, especially when combined with [server] profile_per_query = true.
Default: true
max_profiles_per_database
Specifies the number of profiles to keep in memory for a database. If exceeded, oldest profiles are dropped.
Default: 1
max_profiles_per_session
Specifies the number of profiles to keep in memory for a session.
Default: 1
port
Specifies the TCP/IP port on which the X100 server will listen for connections.
If zero or omitted, the server will dynamically assign a port when it first starts up. A nonzero value should only be defined in a database specific vectorwise.dbname.conf; it must not be defined in the installation-wide vectorwise.conf.
A fixed nonzero port number can be useful in situations where remote direct connections are needed, such as from the vwload utility, and there is a firewall between the client and server. In such cases, the fixed port can be made known to the firewall to allow vwload to connect to the X100 server. The system administrator should choose a port that does not conflict with other listen ports on the network.
Default: 0
profile_per_query
Generates a profile file for each query. The profiles will be saved in the directory dbname/CBM/profiles in timestamp-named files. Valid values: true and false.
Default: false
profile_per_query_dir
Specifies the directory to which profile files are written when using profile_per_query = true.
The installation owner must have write permission to the directory.
Default: empty
Note: Profile files can be written to an HDFS directory.
profile_maxfiles
Specifies the maximum number of profile files retained when using profile_per_query = true. When this threshold is exceeded, the profile files are deleted (starting from the oldest ones) until the limit is met.
Default: 1000
profile_maxsize
Specifies the maximum total size (in bytes) of profile files retained when using profile_per_query = true. When this threshold is exceeded, the profile files are deleted (starting from the oldest ones) until the limit is met.
Default: 100M
profiling
Collects and prints profiling information. Setting this parameter to false may give the highest performance on short-running queries.
Default: true
[cbm] Settings
CBM (Column Buffer Manager) settings in vectorwise.conf affect the buffer manager for X100.
CBM parameters determine the format of the database files when the database is created. If you want the per-database configuration file to be applied during database creation, the file must be created before using the createdb command. For more information, see X100 Configuration File (vectorwise.conf).
Note especially these CBM parameters:
block_size
Specifies the minimum I/O granularity.
This option is the most important for I/O performance.
Note: This setting cannot be changed after database creation!
Note: The system rounds up this setting to the closest power of 2.
Default: 512K
bufferpool_size
Specifies the buffer pool size in bytes (that is, disk cache). The setting can be changed without reloading data.
For VectorH, bufferpool_size is set per node.
Notes:
• Increasing bufferpool_size makes more data reside in memory and may reduce I/O.
• If bufferpool_size is not specified or is not larger than zero, a default setting of 25% of the physical system memory is used.
• Memory size defined with this option does not include max_memory_size.
Default: 0 (use 25% of the physical system memory on the master node)
compression_lz4_enabled
Enables or disables compression of string data using the LZ4 method. Valid values are true and false.
Setting this parameter to true may give better performance on IO-constrained systems.
Default: false
gather_io_enabled
Tells the X100 server whether to check if the operating system supports gather I/O and to use asynchronous gather I/O if it does. Valid values are true and false.
Default: true
group_size
Specifies the number of blocks that are grouped together on disk to improve data locality for sequential scans.
Note: The system rounds up this setting to the closest power of 2.
Default: 8
minmax_maxsize
Specifies the granularity of the (automatically created) min-max indexes on all columns. Min-max indexes are used automatically to derive scan ranges based on selection predicates on columns that correlate to tuple order.
The value of this parameter influences the precision of the range restrictions applied when performing a table scan. With a default setting (1024), up to ~0.1% of a table might be scanned extra for each found scan range. Increasing this value can improve precision, but may slightly decrease the update performance and increase the memory usage. We recommend increasing this parameter only if you issue queries that scan small fractions of a table.
This parameter should be set before database creation.
Default: 1024
pbm_enabled
Enables or disables the Predictive Buffer Manager, which improves performance in workloads that are I/O bound. Valid values are true and false. Setting the parameter to false switches X100 back to using the Least Recently Used (LRU) algorithm.
PBM improves performance in most cases. On rare occasions, however, the LRU algorithm can be faster because it is simpler and incurs less overhead, especially in scenarios where data is fully memory-resident (no blocks are loaded or swapped out during query execution).
Default: true
[engine] Settings
Engine settings in vectorwise.conf affect the query execution engine.
enable_aggregation_disk_spilling
Enables or disables spilling to disk for hash aggregations. Enabling this option prevents out-of-memory conditions from occurring when aggregate results are too large to fit into memory. Valid values are true and false.
Default: false
enable_hashjoin_disk_spilling
Enables or disables spilling to disk for hash joins. Enabling this option prevents out-of-memory conditions from occurring when the hash table built from one of the join inputs is too large to fit into memory. Valid values are true and false.
Default: false
enable_reuse_disk_spilling
Enables or disables spilling to disk for reused repeating query plan parts. Enabling this option prevents out-of-memory conditions from occurring when the buffered output of a reused query part is too large to fit into memory. Valid values are true and false.
Default: false
listagg_group_maxlen
Specifies the maximum result length of the LISTAGG() aggregate function in ASCII bytes.
Limits: 1 to 32000
Default: 4000
max_parallelism_level
Defines the maximum number of threads used by a single query. This is a hard limit.
A value of 1 disables parallel query execution. A value larger than 1 enables parallel query execution.
Max_parallelism_level is related to the
num_cores parameter (see
num_cores). The execution engine tries not to use more than the max_parallelism_level number of cores at a time for any query.
Limits: 0 to 256
For VectorH, max_parallelism_level is set on the master node only, but governs the total level of parallelism throughout the cluster. A good value is the total number of cores across all nodes.
Default: Total number of cores in the cluster.
max_update_parallelism_level
Defines the maximum number of threads used by update (DML) queries. This setting can be used to improve performance of update queries that can benefit from parallelization. Special values:
-1
Treats update queries as normal queries, load balances the number of cores granted to them using the usual algorithms using max_parallelism_level.
level
Overrides max_parallelism_level with the specified level value. Protects the cores granted to update queries by excluding them from the core pool used for load balancing.
If the value is 0, uses num_cores as the level (like in max_parallelism_level).
A value of 1 disables parallel query execution for updates. A value larger than 1 enables parallel query execution.
update_parallelism_level
Defines the forced level of parallelism used by update (DML) queries. This setting can be used to improve performance of update queries that can benefit from parallelization. Values:
-1
(Default) Treats update queries as normal queries, load balances the number of cores granted to them using the usual algorithms using max_parallelism_level (or max_update_parallelism_level).
0
Uses max_parallelism_level (or max_update_parallelism_level, if set) and lets the load balancing algorithm establish the target number of cores for update queries in the usual way. Protects the cores granted to update queries by excluding them from the core pool used for load balancing. Same as max_update_parallelism_level >= 0.
level
Uses the specified hard-coded parallelism level (must be greater than 0) for update queries that will be granted irrespectively of load balancing. Protects the cores granted to update queries by excluding them from the core pool used for load balancing. Takes precedence over max_parallelism_level or max_update_parallelism_level.
[dbagent] Settings
Dbagent settings in vectorwise.conf determine how VectorH interacts with YARN.
required_max_memory_size
Specifies the required amount for the total memory available for query execution. The difference between this value and the [memory] max_memory_size value defines the target level of performance, which is requested in a container in YARN. The required_num_cores parameter must also be specified.
For more information, see Preemption Support in the User Guide.
required_num_cores
Specifies the number of cores required at startup. The difference between this value and the [system] num_cores value defines the target level of performance, which is requested in a container in YARN. The required_max_memory_size parameter must also be specified.
For more information, see Preemption Support in the User Guide.
When to Change the Default Configuration Values
The X100 default configuration is suitable for a machine designated to run a database system, and optimized to exploit most of the available system resources.
The default configuration assumes a single Vector instance per server and a single active database with multiple concurrent queries. It also assumes all nodes used by VectorH have the same number of CPU cores and physical memory available to them.
If you need to use multiple instances and databases simultaneously on one server, manual configuration is needed. (For more information, see
Using Multiple Databases.) Reconfiguration can also be useful on machines with little memory or if you want to achieve maximum performance for single-stream queries.
Memory Settings
The main resource used by X100 is system memory. The system uses two memory pools:
Disk page buffer pool
Configured by setting the [cbm]
bufferpool_size parameter (see
bufferpool_size). The value must be an exact number with a suffix K, M, or G representing kilobytes, megabytes, and gigabytes, respectively.
Default: 0 (use 25% of physical memory)
Query execution memory
Configured by setting the [memory]
max_memory_size (see
max_memory_size) parameter. The number provided is the available physical memory X100 can claim.
Default: 0 (use 50% of available memory)
These two memory settings are cumulative.
For example: On an 8 GB machine, the default configuration will consume up to 6 GB of memory (2 GB buffer pool + 4 GB query memory).
For example: On a 16 GB machine, if you want to configure the system to use 4 GB on the buffer pool and the remaining 10 GB on query memory (assuming 2 GB is reserved for other programs), the configuration file should have these entries:
[cbm]
bufferpool_size = 4G
[memory]
max_memory_size = 10G
Memory Configuration Guidelines
How should you determine how much memory should be set aside for the buffer pool versus query execution?
If you use rather simple queries that do not require a lot of memory (for example, large scans that produce relatively few aggregates), you do not need a lot of query execution memory.
If you have complex queries (for example, joins between large tables, or aggregations that produce many results), you need more query memory.
If your disk system is fast or if your set of frequently accessed data is rather small, large disk buffer is not that crucial. Similarly, if you have a large disk-resident dataset and a slow disk, increasing the buffer pool size can help.
A rule of thumb is to start with query memory to about 50% and buffer pool to about 25% of your memory (the default), and adjust as needed.
I/O Settings
The effective disk block access unit is:
group_size * block_size
This value needs to be sufficiently large; even a single magnetic disk now needs a value of 2 MB to get reasonably good sequential throughput. If you have a multi-disk (RAID) system, you need to multiply this efficiency target by the number of disks, because the requests will be spread across all disks in equal chunks.
Finally, current SSDs can be efficient with somewhat smaller sizes. Group_size*block_size should be at least 512 KB for SSDs.
For example, in a disk configuration consisting of 8 SAS drives in RAID5, you could opt for 16 MB disk transfers, using a group_size of 8 and a block_size of 2 MB:
[cbm]
bufferpool_size = 4G
group_size = 8
block_size = 2M
[memory]
max_memory_size = 12G
In a RAID 0,5 system, group_size*block_size should ideally be an exact multiple of the RAID stripe size times the number of disks. In a RAID 1,10,01 system, group_size*block_size should be stripe size times half the number of disks.
OS Settings
Certain features of your operating system may need to be configured.
Virtual Address Space Allocation
For optimal performance, X100 by default allocates a large amount of virtual address space to be able to use it most efficiently when needed. In some circumstances, this amount can be larger than the amount of physical memory.
Some Linux distributions by default disallow reserving unlimited virtual address space. We recommend that you configure your system to allow unlimited allocation of virtual address space unless there are compelling reasons (unrelated to X100) not to.
To check if your system allows unlimited allocation of virtual address space:
# cat /proc/sys/vm/overcommit_memory
should return 1.
# ulimit –v
should return unlimited.
To ensure the system does not limit allocation of virtual address space
Issue these commands after every system restart:
# echo 1 > /proc/sys/vm/overcommit_memory
# ulimit –v unlimited
To learn how to make this setting persistent, refer to your Linux documentation.
Alternatively, X100 can be configured to not reserve virtual address space by setting the [memory]
max_overalloc (see
max_overalloc) configuration parameter to 0. This may limit the maximum performance your system can deliver.
Note: If you reconfigure your system after Vector is installed to allow unlimited allocation of virtual address space, then you must also set the max_overalloc parameter correctly (to 2G by default).
To use the vwload command in parallel mode, /proc/sys/vm/overcommit_memory must be set to 1 or [memory] max_overalloc must be 0.
Increase max_map_count Kernel Parameter
You may need to increase the Linux kernel parameter max_map_count to avoid running out of map areas for the X100 server process.
To increase the max_map_count parameter
1. Add the following line to /etc/sysctl.conf:
vm.max_map_count=map_count
where map_count should be around 1 per 128 KB of system memory. For example:
vm.max_map_count=2097152
on a 256 GB system.
2. Reload the config as root:
sysctl -p
3. Check the new value:
cat /proc/sys/vm/max_map_count
4. Restart Vector.
Note: The changed setting affects new processes only and should not adversely affect other processes or the OS. The memory is allocated only when a process needs the map areas.
Using Large Pages
Note: This feature requires a good understanding of memory management.
To reduce TLB (translation lookaside buffer) misses, modern CPUs offer a feature called "large pages" ("or huge pages"). Large pages allow a page size of 2 MB (on some CPUs even 1 GB) instead of 4 KB. Using a larger page size is especially beneficial when accessing large amounts of memory with a random pattern. X100 supports the use of large pages for certain data structures that would benefit the most.
Consult the documentation for your operating system for details on how to enable large page support.
Some modern Linux systems (such as RHEL 6) have transparent huge pages functionality. Huge pages are used automatically and do not have to be configured manually. In such cases, do not use the X100 options for huge pages. Consult the documentation for your operating system to see if the operating system supports transparent huge pages and if it needs to be enabled.
Configuration
You must designate an amount of memory for large pages before starting Vector. This memory is used for large page allocations only, not for normal allocations. Because this division of memory is static, you should wisely choose the amount of memory available in large pages.
Note: The large pages feature applies to query memory only, not to buffer memory. Do not assign a significantly higher amount of memory to large pages than the X100 [memory] max_memory_size parameter. Doing so may not leave enough "normal" memory for the buffer pool and other processes in the system.
If you encounter problems with large pages, you can switch it off in the vectorwise.conf file by setting [memory]
use_huge_tlb (see
use_huge_tlb) to FALSE.
Requirements for Huge Pages on Linux
To enable the use of huge pages in Vector the following is needed:
• Kernel support. (Most distributions enable this in the standard kernel.)
• libhugetlbfs library must be installed.
For easier administration of this feature, extra tools are recommended, often found in a package called libhugetlbfs-utils or similar.
Designate Memory for Huge Pages on Linux
The commands and amounts provided here is an example of designating memory for huge pages on Linux. Before issuing these commands, understand what they do, and adapt the examples, as needed. Reserving pages for use in huge page allocations is system wide, so make sure you are not interfering with other users.
To make 2 GB available for 2 MB huge pages, issue the following commands as root on the command line before starting Vector:
hugeadm --create-global-mounts
hugeadm --pool-pages-min 2M:1024
To switch off, enter the following command as root on the command line:
hugeadm --pool-pages-min 2M:0
To check if memory is allocated for huge pages and how much of it is in use, type at the command line:
cat /proc/meminfo
The information about huge pages is shown in the following example lines:
HugePages_Total: 1024
HugePages_Free: 1024
HugePages_Rsvd: 0
HugePages_Surp: 0
Making memory available for huge pages requires defragmenting the specified amount of memory, so it can take a while. Typically, it is fastest to do this immediately after system startup, when memory is not as fragmented.
For more in-depth information, see the man page of hugeadm (
https://linux.die.net/man/8/hugeadm) and vm/hugetlbpage.txt (
https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt) in the Linux kernel documentation.
Using NUMA Optimization
Some modern many-core systems trade uniform memory access times with RAM for better scalability. This is known as NUMA (non-uniform memory access).
Consult the documentation for your system and for your operating system to find out if they support NUMA.
X100 provides a set of NUMA optimizations which, under Linux, require libnuma version 2.0.2 or newer. These optimizations are enabled per default when libnuma is detected and the system running X100 is a NUMA system (number of NUMA nodes > 1).
Note: When the NUMA optimizations are enabled, X100 will use all NUMA nodes in your system, even if you restricted the number of NUMA nodes by using numactl.
If you encounter problems with the NUMA optimizations, it can be disabled in vectorwise.conf by setting [memory]
use_numa (see
use_numa) to FALSE.
User Processes Limit in RHEL
The limit of user processes in Red Hat Enterprise Linux should be raised from the default value of 1024. To do so, add a line to the file etc/security/limits.conf, as follows:
* soft nproc 8192
The limit can be checked with the ulimit ‑u command.
RLIMIT_MEMLOCK (Linux)
To prevent encryption keys stored in memory from being swapped to disk (a security risk), the Linux configuration parameter RLIMIT_MEMLOCK must be properly configured. If set too low, an error will occur when working with encrypted tables. RLIMIT_MEMLOCK defines how much memory an unprivileged process can lock. Locking a memory area protects it from swapping.
To configure this setting properly, use the following calculation: 40 bytes per encrypted table used in a query, rounded up to a full page of 4K (or 8K if overlapping page boundaries) per concurrent query (and only for the duration of the query).
To verify the current setting, as the installation owner user, issue the following command at a shell prompt:
ulimit -a
The value of "max locked memory" is the RLIMIT_MEMLOCK value. If it is too small, it can be changed by editing either the limits.conf file or the system-system.conf file, depending on whether your Linux uses systemd. Consult your system administrator for assistance.
Using Multiple Databases
X100 fully isolates activities from various databases. Each database has its own storage system, buffer pool, and query memory pool. As a result, multiple databases running at the same time can lead to high resource consumption. For optimal performance, we strongly recommend that only a single database be used on a machine. Consider using different schemas to isolate multiple applications on the same database.
If you do plan to use multiple databases on one machine, we recommend that you reduce their resource consumption, mainly memory usage (max_memory_size and bufferpool_size parameters).
Closing an Active Database
If no more activity for a particular database is expected, you can recover system resources by terminating the X100 Engine for that database. You can either terminate the engine forcibly, or specify that it should stop when no active sessions are using the database.
To stop the X100 Engine immediately
Issue this statement in a SQL client connected to the database you want to close:
CALL X100(TERMINATE)
Note: This may interrupt running queries and cause them to return an error.
To stop the X100 Engine when no other sessions are using that database
CALL X100(TERMINATE_IF_IDLE)
and then exit your SQL session. The X100 Engine will exit when no other sessions using that database are active.
SQL Settings
The following SQL commands affect the current session only:
SET RESULT_STRUCTURE X100
Forces WITH STRUCTURE=X100 on all CREATE TABLE, CREATE TABLE AS SELECT, and DECLARE GLOBAL TEMPORARY TABLE statements.
The default for the installation is taken from the result_structure parameter in config.dat.
SET RESULT_STRUCTURE X100_ROW
Forces WITH STRUCTURE=X100_ROW on all CREATE TABLE, CREATE TABLE AS SELECT, and DECLARE GLOBAL TEMPORARY TABLE statements.
The default for the installation is taken from the result_structure parameter in config.dat.
SET QEP
Displays a graphical version of the query plan.
SET TRACE POINT OP150
Displays the contents of the compiled query.
SET TRACE POINT QE82
Disables compression for tables.
Automatic table compression is used by default. If the dataset is small and fits into memory, disabling compression may slightly increase performance by eliminating the overhead associated with decompression.
For example:
SET TRACE POINT QE82;
CREATE TABLE...
SET NOREUSE
Disables the "reuse" heuristic of the query optimizer.
Performance Tips
Following these recommendations will help to increase Vector performance:
• Always use optimizedb.
Without optimizedb, the SQL optimizer may order joins incorrectly.
• Use a large buffer pool.
If possible, set
bufferpool_size (see
bufferpool_size) to fit in memory the data you need for most queries. A large buffer pool is not as important if you have a high performance disk subsystem.
• Do not use compression on tables residing in memory.
If your data fits in main memory uncompressed, and you want to maximize performance, you should consider disabling automatic compression, because decompression causes some delays. (Use set trace point qe82 before creating tables.)
• Prefer NOT NULL fields over nullable fields for lower disk consumption and to achieve faster processing.
Processing nulls always introduces overhead. Make sure you define a column as NOT NULL when it always contains a value.
• Prefer numerical over character data types for higher data compression ratios.
Processing string data is more expensive than numerical data (integer, floating point, decimal, date/time). In cases where a limited number of strings are used as codes, consider using integer codes (or, if possible, single character strings) for these.
• Prefer DECIMAL and INTEGER data types over FLOAT data types for higher data compression ratios.
• Try using indexes.
Note: Vector supports clustered indexes (a table organized as an index) and secondary indexes. For a clustered index, specify columns that are used often in filters (for example, a date column in a typical data warehouse). VectorH does not support secondary indexes.
• When designing a database schema for large and growing data warehouse tables, you should carefully take into account the query workload and update workload:
– Indexed tables may speed up query access, but slow down bulk load and updates.
– Batch updates should be avoided in extremely intensive (“firehose”) update workloads. High volume and high velocity data should be appended to the database.
– When working with thousands of small tables or having frequent small appends, reduce the [cbm]
group_size parameter (see
group_size) for lower disk consumption.
• Take advantage of parallel query execution.
Parallel query execution (using multiple cores to execute a single query) works in most scenarios. For example, it improves the performance of the query described in Run a Reporting Query in the
User Guide. Parallel query execution can be enabled by setting the [engine] max_parallelism_level parameter in the configuration file. See also
Per-query Parallelism Level.
Note: The terms parallel query execution or parallelism in this guide mean using multiple cores to execute one query (as compared to running multiple queries at the same time, which is always possible).
• Consider enabling data compression over the network when connecting through JDBC or Ingres Net. Performance can improve when compression is enabled on remote connections for large result sets. Do not use compression for small data transfers (less than 256 bytes) or when the client application, Data Access Server (DAS), and DBMS Server are on the same machine.
Compression is enabled with a connection attribute "compress=on". For JDBC, add the attribute to the URL after the database name: jdbc:ingres://host:VW7/db;compress=on. For Net, add the attribute to the connection target string (if using dynamic vnode): sql "@host,VW;compress=on[usr,pwd]::db". Or you can add the attribute to the vnode definition in netutil. For more information, see the Connectivity Guide.
• Limit the creation of min-max indexes. By default, Vector creates min-max indexes on all columns. You can limit this to specific columns by using WITH [NO]MINMAX on CREATE TABLE.
Caution! Limiting min-max indexing with NOMINMAX can severely impact performance. Use NOMINMAX only if necessary to alleviate memory usage problems with very wide tables (many columns), and then use it only for columns not involved in restrictions or joining, if possible.
Creating Indexes
Primary (clustered) indexes can be created for Vector tables.
Note: VectorH does not support secondary indexes.
A clustered index (also known as a primary index in Vector) is a table organized as an index.
Use a primary index only if the table is predominantly accessed through the indexed columns. If the table is often filtered or joined only on non-indexed columns then the index will likely slow down rather than improve query performance.
Restrictions are as follows:
• Create a primary index after creating the table.
• Only one primary index is allowed per table.
• An initial bulk load into the table is directly and efficiently written to disk. Subsequent updates are always loaded into memory first, unless the MODIFY TO COMBINE operation is used.
To create a primary index
Enter your CREATE INDEX statement at the terminal monitor prompt. The following example creates an index named l_idx on the l_shipdate column of the lineitem table:
CREATE INDEX l_idx ON lineitem(l_shipdate);
Per-query Parallelism Level
An explicit parallelism level can be set for a single query. to test its effect. Use the following syntax:
SELECT...WITH MAX_PARALLEL n
where n is an integer between 1 and 256. The value of n should not exceed the number of CPU cores visible to the operating system.
Data Size Consideration for Some Queries
To get the fastest query response times, Vector by default does not spill intermediate results to disk for joins and aggregations. As a result, using the default settings, Vector may not be able to perform a requested operation for some types of queries due to high memory usage. The following scenarios are the most typical:
• Performing an aggregation that produces a high number of records. For example:
SELECT l_orderkey, COUNT(*)
FROM lineitem
GROUP BY l_orderkey
If the number of distinct values of "l_orderkey" is high (tens of millions or more), the query may fail, depending on available memory.
Increasing available memory can be a partial solution for this problem. Also, indexing the "lineitem" table on "l_orderkey" allows a faster and more memory-efficient solution to be used.
Another option is to let aggregation save its temporary results on disk, allowing it to scale beyond available memory. This technique, known as "spilling to disk," can be enabled by setting the [engine] enable_aggregation_disk_spilling parameter to "true".
Note: Enabling spilling to disk will let the query complete but may cause severe performance degradation, possibly even orders of magnitude slower than when the query is running in memory. This is because disk access is slower than memory access. Using faster disk subsystems can partially alleviate the slower performance.
• Performing a join between two large tables without any restrictions. For example:
SELECT o_orderdate, l_receiptdate
FROM lineitem, orders
WHERE l_orderkey = o_orderkey
If the number of records in "orders" is high (tens of millions or more), the query may fail, depending on available memory.
Increasing available memory can be a partial solution for this problem. Also, indexing both "lineitem" and "orders" tables on the join key allows a faster and more memory-efficient solution to be used.
Another option is to let the join save its temporary hash table on disk, allowing it to scale beyond available memory. This technique, known as "spilling to disk," can be enabled by setting the [engine] enable_hashjoin_disk_spilling parameter to "true".
Note: Enabling spilling to disk will let the query complete but may cause severe performance degradation, possibly even orders of magnitude slower than when the query is running in memory. This is because disk access is slower than memory access. Using faster disk subsystems can partially alleviate the slower performance.
• Performing a duplicate-eliminating UNION where the input relations have a high number of records. For example:
SELECT o_orderkey, o_orderdate
FROM orders1
UNION
SELECT o_orderkey, o_orderdate
FROM orders2
If the number of values in orders1 and orders2 is high (tens of millions or more), the query may fail, depending on available memory. Increasing available memory can be a partial solution for this problem. Also, you can consider using a UNION ALL approach if you know there are no duplicates, or if you accept duplicate entries.
Because duplicate-elimination in UNION is a special form of aggregation, the previous discussion about spilling to disk for aggregation also applies to UNION.
X100 Write-Ahead Log
X100 keeps information on the internal state of the database in the Write-Ahead Log (WAL), which is stored in the wal directory. The wal directory exists for each database and is located in the database directory of the default data location. The wal directory contains the main.wal file and the pdts subdirectory. The pdts subdirectory contains per-partition wal files that store PDTs.
The WAL is similar to that used by other database systems, but also holds other critical information.
IMPORTANT! Deleting or modifying the wal directory may lead to irreversible data loss.
Old WAL Files
The main.wal file grows as DDL and DML queries make changes to the database. When the file grows significantly, the system tries to shrink its size by creating a new, smaller version of it. For recovery, previous versions of the file are saved in the main_wal_backups directory of the default data location.
Each file in the main_wal_backups directory is named using its creation timestamp. When the total size of the files in the directory exceeds the configuration parameter
max_old_log_size (on
max_old_log_size), the oldest files in the directory are automatically deleted.
The files in the main_wal_backups directory can be manually removed without adversely affecting the data in the database. Nevertheless, we recommend keeping at least one or two most recent files as backups for database recovery.
WAL Condensation
WAL condensation can be triggered manually by issuing the following statement:
CALL X100 (CONDENSE_LOG)
A backup copy of the main.wal file is stored in the main_wal_backups subdirectory, and then the main.wal file is compacted by dropping unused information.
The files stored in the pdts directory are not affected by the condensation. They are cleaned (removed) automatically when the PDT data stored inside is no longer needed, for example, after update propagation.
wal_backups Directory
The wal_backups directory is created on system startup whenever the system detects that the last condensation of the transaction log did not finish completely. Each subdirectory of wal_backups corresponds to one occurrence of such a situation.
The subdirectories may contain all or a subset of the following files: main.wal, main.wal.orig, location_map, location_map.orig, and the contents of the wal/pdts subdirectory. These are copies of the files found in the wal directory at startup.
The directory and its contents is a fail-safe mechanism that allows the database to be restored to its state following the last unfinished condensation of the transaction log. Normally the system handles such a situation automatically. If the automated mechanism fails, restoration must be performed manually with the assistance of Actian Support. Such a situation is unlikely.
Existence of the wal_backups directory simply means that the server terminated abnormally while the condensation of the transaction log was in process. If the system starts and works correctly, the wal_backups directory can be removed completely. If the system is unable to start, the data in wal_backups should be retained.
Error Reporting--vectorwise.log
Errors and other messages for X100 are logged in the vectorwise.log file. The file is located in:
II_SYSTEM/ingres/files/vectorwise.log
Besides the log file on the master node, each slave node also has a local log file that can be found in the location stated above.
Configuring X100 Error Reporting
You can configure error logging to vectorwise.log as follows:
• Change the level of information logged for all or specific facilities.
• Change the name and location of the log file.
• Change the record format.
X100 Error Log Record Format
All parts of a log record are optional except for the message.
A log record has the following format:
[#][timestamp][PIDpid:][TIDtid:][SIDsid:][TXIDtxid:][QIDqid:][level:][facility:][file:line:]message
where:
#
The # sign prints for non-error level messages, per output.
timestamp
The timestamp of the message, which can be in one of two formats: ISO and milliseconds from epoch.
pid
The logging process ID
tid
The thread ID
sid
The x100 session ID
txid
The transaction ID
qid
The query ID (within session)
level:
The level of information logged. ERROR and WARN level messages are always logged.
facility:
The facility name
file:line
The source file and line that caused the message to be issued
message
The message text
X100 Error Log Configuration File
The file for configuring the X100 error log is named vwlog.conf and resides in the directory indicated by the II_CONFIG environment variable.
To see the value for II_CONFIG
Issue the following command at the operating system prompt:
ingprenv II_CONFIG
The vwlog.conf file delivered with Vector sets the default log level for all facilities to WARN and the output to the vectorwise.log file. It also contains log settings for specific facilities.
X100 Error Log Configuration File Format
Note: The error log configuration file has the following format:
facility_name = facility_config | default= facility_config
filename = path/to/log_file_name
file = output_config
Each line in the configuration file configures one output, one facility, or additional options. Lines can be separated by a comma (for an example, see
VWLOG Environment Variable--Configure Error Reporting). The # character marks a comment until end of the line (a comma does not end the comment).
where:
facility_name = facility_config
Specifies one or more logging options for the default (all facilities) or specified facility, as described in Facility Configuration.
filename = path/to/log_file_name
Specifies the file for FILE log output. The path can be absolute, or relative to II_LOG directory, if set, or to II_CONFIG directory otherwise.
Default name: vectorwise.log
file = output_config
X100 Log Record Format Configuration
The syntax for configuring log record output is as follows:
file = output_config
file = output_config
Specifies the output format for the log file. Separate each option by a colon (:). If no options are specified, only the message itself is displayed. :
timestamp
Uses ISO timestamp. (Consider using with usec for 6-digit usec information.)
timesec
Uses sec.usec timestamp format. (Consider using with usec for 6 rather than 3 usec digits.)
showlevel
Displays level name TRACE, DEBUG or INFO before the message (ERROR and WARN level names are always displayed.)
comment
Displays # before all log lines except for ERROR level messages.
fileline
Displays source file name and line for each message.
pid
Shows logging process PID.
tid
Shows logging thread ID.
sid
Shows client session ID in context of which this log message is printed.
txid
Shows transaction ID in context of which this log message is printed.
qid
Shows X100 query ID (within a session) in context of which this log message is printed.
flush
(Recommended) Flushes log file after each write.
color
Displays colored logs (yellow WARNs and red ERRORs) on STDERR/STDOUT. Can be used only on black background terminals.
usec
Provides 6-digit usec information to both timestamp and timesec.
The order in which options are specified does not influence the format in which the corresponding fragments of the log lines are printed. Only one of timestamp or timesec can be in effect.
Example:
file = pid:tid:sid:txid:flush:timestamp:showlevel
Facility Configuration
The syntax for specifying logging options for the default or specific facility is as follows:
facility_name=facility_config
facility_name
Specifies the name of the facility. Possible names include:
default
Specifies the configuration for all facilities that have no configuration specified, and for single unspecified options. Default configuration must be specified before any other facilities in the configuration file.
CBM
(Recommended) Sets log configuration for buffer manager facility. If set to info, general information about the size of the memory buffers and columnspace is included in the log file on startup.
WAL_CONDENSATION
Sets log configuration for the WAL condensation facility. If set to info, information about the performed WAL condensation operations is included in the log.
MEMCONTEXT
Sets log configuration for memory management facility. If set to info, a dump of memory allocation details is printed in case of out of memory (OOM) errors, and general system memory statistics are printed with every query.
NETBUFFER
Sets log configuration for the netbuffer facility.
QUERYERROR
Sets log configuration for query error facility. If set to info, failed queries print the query text, the error message, and in case of OOM errors, memory allocation details for easier debugging.
SCHEDULER
Sets log configuration for query scheduler. If set to info, information about system load and cores granted to queries is printed with every query.
SYSCALL
Sets log configuration for SYSCALL facility. If set to info, information about execution of some system calls (rewrite, checkpoint, write_lock, profile_memcontext) is included in the log.
SYSTEM
(Recommended) Sets log configuration for System facility. If set to info, the server version and other identifying information is included in the log.
UPDATE_MGR
Sets log configuration for transaction management. If set to info, information about starting and ending transactions is included.
UPDATE_PROPAGATION
Sets log configuration for the update propagation facility. If set to info, information about the performed log condensation operations is included in the log.
X100SERVER
Sets log configuration for server queries. If set to info, information about starting and ending sessions and queries is included.
facility_config
Specifies the configuration line for the specified facility. Consists of one or more of the following options separated by colons (:):
error, warn, info, debug
Sets minimum level of displayed messages.
file
Logs messages for this facility to given output streams. Only "file" is supported.
Example:
default = warn:file
SYSTEM = info:file
SYSCALL = info:file
CBM = info:file
QUERYERROR = info:file
Example X100 Error Log Configurations
1. The following configuration:
default=info:file
filename=/home/vectorwise/vectorwise.log
file=pid:timestamp:showlevel:flush
• Sets the log level to INFO.
• Changes the location of the log file from its default location to home/vectorwise.
• Shows the process ID, ISO timestamp, and the log level in each log record.
• Flushes the log file after each write.
2. The following configuration:
default=warn:file
filename=/tmp/vw.log
file=color:timestamp:showlevel:flush
facility NETBUFFER=cyan
• Writes output to the file /tmp/vw.log.
• Shows the ISO timestamp and log level, and flushes the log file after each write.
• Shows messages issued by the NETBUFFER facility in cyan.
3. The following configuration:
default=warn:file
file=timestamp:showlevel:pid:txid
QUERYERROR=info:file
• Shows the ISO timestamp, log level, process ID, and transaction ID in each log record.
• Shows information about failed queries by raising the log level of the QUERYERROR facility to INFO.
VWLOG Environment Variable--Configure Error Reporting
Instead of creating an error log configuration file, you can configure X100 logging by specifying the VWLOG environment variable before Vector is started.
The following results in INFO-level logging that is written to the file /tmp/vectorwise.log:
export VWLOG="default=info:file,filename=/tmp/vectorwise.log,file=timestamp:showlevel:flush:pid"
Rotating the X100 Error Log
If the error log gets too large, the VWLOG_ROTATE system call can be used to close the current error log file, move it to a new file, and then reopen the old error log file (which is now empty) and continue to append incoming log messages.
For example: The following statement moves the error log file to archive1.log. Subsequent log lines are written to a new, empty file that has the same name as the old log file:
CALL X100(VWLOG_ROTATE '''/opt/Actian/VectorXX/ingres/files/archive1.log''');
For more information, see VWLOG_ROTATE in the SQL Language Guide.
Changing Error Log Configuration Settings Dynamically
Error logging configuration can be changed dynamically, for example, if you want to increase the logging level while debugging a process. The VWLOG_RELOAD system call loads a specific X100 error log configuration file, whose rules are used for all subsequent log events. Restarting Vector is not necessary.
For example, the following statement tells Vector to immediately start using the specified log configuration file:
CALL X100(VWLOG_RELOAD '''/opt/Actian/VectorVH/ingres/files/vwlog.conf''');
For more information, see VWLOG_RELOAD in the SQL Language Guide.
Logging for Startup of the X100 Server
When the x100_server is first started there is a small window where the process may be generating output (errors) but is not yet writing to the vectorwise.log. This output can be found in two locations:
• x100errlog.log
Output relating to x100_server startup including MPI Runtime, SSH, k5start messages, and x100_server return codes, and any errors. Default location is II_SYSTEM/ingres/files.
• DBMS log
All output generated by the x100_server during startup is written to this log. To enable the log, run:
ingsetenv II_DBMS_LOG $II_SYSTEM/ingres/files/iidbms_%p.log
The %p appends to the file the process ID of the process generating the log to prevent the file from being overwritten. DBMS logs are created by the iidbms and dmfrcp servers.
Configure x100errlog.log Rotation
The x100errlog.log file may need to be rotated to save disk space and to prevent large, unwieldy files.
If enabled, rotation starts a new, empty log file, and then compresses and renames the old file. The system keeps a specified number of old files and deletes all older ones.
Log rotation occurs at startup and no more than once a day unless the size grows to exceed the same limit as errlog.log, which defaults to 1 GB.
To configure x100errlog.log rotation, use the iisetres command to change the following parameters in config.dat:
ii.$.x100.errlog_rotate
Enables or disables log rotation. Valid values: true, false, force.
Default: false
ii.$.x100.errlog_max_size
Specifies the size after which log should be rotated.
Default: 100 KB
ii.$.x100.errlog_keep
Specifies the number of compressed logs to keep.
Default: 5
If errlog_rotate is set to true, the log file is rotated on startup only if its size exceeds errlog_max_size. If errlog_rotate is set to force, the log is rotated on every startup, independent of size. If set to false (default) no log rotation is performed.
Identifying Unused Table Files
X100 automatically removes files that are no longer used (for example, after a table or column is dropped).
System crashes, however, may cause unused table files that should have been deleted to be left on disk. If disk space is low you may need to remove such files.
Follow these steps to find and remove unused table files:
1. Connect to the database and issue the following SQL statement:
CALL X100(CLEANUP_UNUSED_FILES)
This system command scans all locations recursively to find files that do not belong to any tables. It renames such files by adding "unused_" prefix. This command can be executed on a working system.
2. Make sure that the system is running correctly without these files.
3. Remove the unused files.
Note: CLEANUP_UNUSED_FILES does not mark unreleased files that may still be needed by a transaction. So using this system call may not help you to free space if there are long-running or hanging transactions in the system.
Configure the Checking of Free Disk Space
To avoid unexpected outages caused by disk space exhaustion, Vector checks minimum free space requirements on startup. It determines the free space under II_SYSTEM, II_DATABASE, and II_CHECKPOINT locations.
You can configure the minimum free space requirements by using the iisetres command to change the following resources in config.dat:
ii.$.ingstart.ii_system.freespace
Sets the minimum free space requirement for II_SYSTEM.
Default: 200MB
ii.$.ingstart.ii_database.freespace
Sets the minimum free space requirement for II_DATABASE.
Default: 100MB
ii.$.ingstart.ii_checkpoint.freespace
Sets the minimum free space requirement for II_CHECKPOINT.
Default: 100MB
If any or all of the locations share a common location or physical partition, the requirements for those locations are summed.
How to Free Disk Space
If the disk space for your data location becomes low, follow these steps:
1. Extend the database to a new location. For details, see the chapter "Using Alternate Locations."
2. Copy a table into the new location using CREATE TABLE AS SELECT statement.
3. Drop the old table.
Disk space is freed to the operating system.
Configure MPI Runtime
You can configure the MPI Runtime by using the iisetres command to change the following parameters in config.dat:
ii.hostname.x100.mpi.fabrics
MPI flag: I_MPI_FABRICS
Specifies network fabrics to be used for (intra:inter node) communication.
Default: shm:tcp
ii.hostname.x100.mpi.iface
MPI flag: -iface
Specifies the network interface card (NIC) to be used for internode communication.
Default: None
Example: iisetres ii.hostname.x100.mpi.iface eth1
ii.hostname.x100.mpi.debug
MPI flag: I_MPI_DEBUG
Specifies the debug level for MPI infrastructure.
Default: None