X100 Configuration Parameters
Each configuration parameter belongs in one of these sections: memory, system, server, cbm, or engine.
[memory] Settings
Memory settings in vectorwise.conf affect the memory used for query processing (not for data caching in the buffer pool).
max_memory_size
Specifies the amount (in bytes) of the total memory used for query execution.
Notes:
• Max_memory_size applies to all queries executing across the system, but a single query can consume almost all available memory.
• Some queries will fail if they consume too much memory; increasing available memory can help.
• If max_memory_size is not specified or is not larger than zero, a default setting of 50% of the physical system memory is used.
• Memory size defined with this option does not include bufferpool_size.
Limits: 96 MB to 256 TB
Default: 0 (use 50% of the physical system memory)
max_overalloc
Specifies the maximum amount of memory that will be overallocated (reserved as virtual memory) for zero-cost growing of certain memory areas. The setting needs to be changed only in rare cases.
A value of 0 means no overallocation. A value of ‑1 means overallocate with the amount of total system memory ("unlimited").
Note: Overallocation should be disabled on operating systems where: /proc/sys/vm/overcommit_memory is not 1, or "ulimit -v" is not unlimited.
Default: 2G or set to 0 upon installation if the system does not allow overallocation.
huge_tlb_lim
Enables the huge pages feature of the CPU only for queries using the number of cores less than or equal to the specified value. Set this option to 1 if parallel queries do not benefit from the huge pages feature. This may be experienced on some operating systems, such as Red Hat Enterprise Linux 6, which has an automatic anonymous TLB feature.
Special value 0 means there is no limit.
Default: 0 (no limit)
use_huge_tlb
Specifies whether to use the
huge pages feature (see
Using Large Pages) of the CPU. Valid values are true and false.
Default: false
use_numa
Default: true
[system] Settings
System settings in vectorwise.conf affect the X100 system.
full_rewrite_on_combine
Specifies whether all blocks are to be rewritten by the COMBINE command. Setting this option to true can be used to avoid or undo block fragmentation introduced by partial update propagation.
Default: false
full_rewrite_on_update_propagation
Specifies whether all blocks are to be rewritten by the system-triggered propagation process. Setting this option to true can be used to avoid block fragmentation introduced by partial update propagation.
Default: false
max_old_log_size
Specifies the maximum size for the main_wal_backups directory, which holds old main.wal files.
Default: 2 GB
Note: The system will exceed the default value if it is necessary to keep a single old main.wal file that exceeds 2 GB.
num_cores
Specifies the number of processing units in the system. This value is used to calculate the amount of memory available for each transaction.
Num_cores is used to calculate the maximum parallelism level for a newly issued query if there are many queries already being executed. The goal is not to deteriorate the total throughput of the system. Generally, the higher the num_cores value, the higher parallelism levels are granted for queries running concurrently (where each parallelism level is not higher than max_parallelism_level).
Num_cores is used to derive a database-wide target on the number of concurrent threads used for parallel execution. Queries will run at a lower degree of parallelism if the default or requested degree would lead to an uneven allocation of resources to the new query based on the number of concurrently executing queries. On a system running many concurrent queries, new queries may get only one core and will more likely run out of memory since the num_cores value is used to reserve memory for possible other threads.
Default: Number of processors visible to the operating system.
resolve_concurrent_updates
Tells X100 whether to attempt to merge concurrent updates on the same table. If set to false, the transaction that tries to commit last will fail and will have to abort. Disabling merging of concurrent updates can reduce memory consumption, especially if there are any long-running transactions in the system.
Default: true
system udf_conf_path
Specifies the location of the udf_engines.conf file when configuring the user defined functions.
use_sse42
Uses SSE4.2 for accelerated string processing on CPU architectures that support it.
Default: YES
PDT Parameters
The Positional Delta Tree (PDT) is a data structure used for batch updates residing in memory.
max_global_update_memory
Specifies an upper limit on total update memory consumption as a percentage of query memory limit (max_memory_size). When this limit is reached, the system propagates in-memory updates to disk.
Update memory is a component of query memory.
Limits: 0.01 to 1.0
Default: 0.25 (one quarter of max_memory_size)
max_number_propagate_retries
Specifies the maximum number of retries for failed automatic update propagation. When propagation for a certain table fails, X100 will retry propagating this table if no alternative tables are possible.
Default: 3
max_table_update_ratio
Specifies a per-table maximum percentage of in-memory updates (that is, the number of updates divided by the number of stable tuples). When this limit is reached, the system propagates to disk in-memory updates for this table.
Limits: 0.01 to 1.0
Default: 0.05
max_update_memory_per_transaction
Specifies an upper limit on the total update memory in the snapshot layer of a transaction, as a percentage of query memory limit (max_memory_size). When this limit is reached, the system will not allow further update operations in this transaction.
Limits: 0.01 to 1.0
Default: 0.25 (one quarter of max_memory_size)
min_propagate_table_count
Specifies the minimum number of tuples in a table above which max_table_update_ratio is checked. This is to avoid frequent update propagation to disk on relatively small tables.
Default: 500K
update_propagation
Enables or disables system-triggered propagation of in-memory updates. Valid values are true and false.
Note: Disabling automatic propagation overrides the max_global_update_memory setting and can lead to unwanted consequences if query memory becomes full with PDTs.
Default: true
[server] Settings
Server settings in vectorwise.conf affect the X100 server.
max_profiles_per_database
Specifies the number of profiles to keep in memory for a database. If exceeded, oldest profiles are dropped.
Default: 1
max_profiles_per_session
Specifies the number of profiles to keep in memory for a session.
Default: 1
port
Specifies the TCP/IP port on which the X100 server will listen for connections.
If zero or omitted, the server will dynamically assign a port when it first starts up. A nonzero value should only be defined in a database specific vectorwise.dbname.conf; it must not be defined in the installation-wide vectorwise.conf.
A fixed nonzero port number can be useful in situations where remote direct connections are needed, such as from the vwload utility, and there is a firewall between the client and server. In such cases, the fixed port can be made known to the firewall to allow vwload to connect to the X100 server. The system administrator should choose a port that does not conflict with other listen ports on the network.
Default: 0
profile_per_query
Generates a profile file for each query. The profiles will be saved in the directory dbname/CBM/profiles in timestamp-named files. Valid values: true and false.
Default: false
profile_per_query_dir
Specifies the directory to which profile files are written when using profile_per_query = true.
The installation owner must have write permission to the directory.
Default: empty
profile_maxfiles
Specifies the maximum number of profile files retained when using profile_per_query = true. When this threshold is exceeded, the profile files are deleted (starting from the oldest ones) until the limit is met.
Default: 1000
profile_maxsize
Specifies the maximum total size (in bytes) of profile files retained when using profile_per_query = true. When this threshold is exceeded, the profile files are deleted (starting from the oldest ones) until the limit is met.
Default: 100M
profiling
Collects and prints profiling information. Setting this parameter to false may give the highest performance on short-running queries.
Default: true
[cbm] Settings
CBM (Column Buffer Manager) settings in vectorwise.conf affect the buffer manager for X100.
CBM parameters determine the format of the database files when the database is created. If you want the per-database configuration file to be applied during database creation, the file must be created before using the createdb command. For more information, see X100 Configuration File (vectorwise.conf).
Note especially these CBM parameters:
block_size
Specifies the minimum I/O granularity.
This option is the most important for I/O performance.
Note: This setting cannot be changed after database creation!
Note: The system rounds up this setting to the closest power of 2.
Default: 512K
bufferpool_size
Specifies the buffer pool size in bytes (that is, disk cache). The setting can be changed without reloading data.
Notes:
• Increasing bufferpool_size makes more data reside in memory and may reduce I/O.
• If bufferpool_size is not specified or is not larger than zero, a default setting of 25% of the physical system memory is used.
• Memory size defined with this option does not include max_memory_size.
Default: 0 (use 25% of the physical system memory)
compression_lz4_enabled
Enables or disables compression of string data using the LZ4 method. Valid values are true and false.
Setting this parameter to true may give better performance on IO-constrained systems.
Default: false
gather_io_enabled
Tells the X100 server whether to check if the operating system supports gather I/O and to use asynchronous gather I/O if it does. Valid values are true and false.
Default: true
group_size
Specifies the number of blocks that are grouped together on disk to improve data locality for sequential scans.
Note: The system rounds up this setting to the closest power of 2.
Default: 8
minmax_maxsize
Specifies the granularity of the (automatically created) min-max indexes on all columns. Min-max indexes are used automatically to derive scan ranges based on selection predicates on columns that correlate to tuple order.
The value of this parameter influences the precision of the range restrictions applied when performing a table scan. With a default setting (1024), up to ~0.1% of a table might be scanned extra for each found scan range. Increasing this value can improve precision, but may slightly decrease the update performance and increase the memory usage. We recommend increasing this parameter only if you issue queries that scan small fractions of a table.
This parameter should be set before database creation.
Default: 1024
pbm_enabled
Enables or disables the Predictive Buffer Manager, which improves performance in workloads that are I/O bound. Valid values are true and false. Setting the parameter to false switches X100 back to using the Least Recently Used (LRU) algorithm.
PBM improves performance in most cases. On rare occasions, however, the LRU algorithm can be faster because it is simpler and incurs less overhead, especially in scenarios where data is fully memory-resident (no blocks are loaded or swapped out during query execution).
Default: true
[engine] Settings
Engine settings in vectorwise.conf affect the query execution engine.
enable_aggregation_disk_spilling
Enables or disables spilling to disk for hash aggregations. Enabling this option prevents out-of-memory conditions from occurring when aggregate results are too large to fit into memory. Valid values are true and false.
Default: false
enable_hashjoin_disk_spilling
Enables or disables spilling to disk for hash joins. Enabling this option prevents out-of-memory conditions from occurring when the hash table built from one of the join inputs is too large to fit into memory. Valid values are true and false.
Default: false
enable_reuse_disk_spilling
Enables or disables spilling to disk for reused repeating query plan parts. Enabling this option prevents out-of-memory conditions from occurring when the buffered output of a reused query part is too large to fit into memory. Valid values are true and false.
Default: false
listagg_group_maxlen
Specifies the maximum result length of the LISTAGG() aggregate function in ASCII bytes.
Limits: 1 to 32000
Default: 4000
max_parallelism_level
Defines the maximum number of threads used by a single query. This is a hard limit.
A value of 1 disables parallel query execution. A value larger than 1 enables parallel query execution.
Max_parallelism_level is related to the
num_cores parameter (see
num_cores). The execution engine tries not to use more than the max_parallelism_level number of cores at a time for any query.
Limits: 0 to 256
Default: Number of processors on the machine or 8 if the machine has more than 8 processors.
max_update_parallelism_level
Defines the maximum number of threads used by update (DML) queries. This setting can be used to improve performance of update queries that can benefit from parallelization. Special values:
-1
Treats update queries as normal queries, load balances the number of cores granted to them using the usual algorithms using max_parallelism_level.
level
Overrides max_parallelism_level with the specified level value. Protects the cores granted to update queries by excluding them from the core pool used for load balancing.
If the value is 0, uses num_cores as the level (like in max_parallelism_level).
A value of 1 disables parallel query execution for updates. A value larger than 1 enables parallel query execution.
update_parallelism_level
Defines the forced level of parallelism used by update (DML) queries. This setting can be used to improve performance of update queries that can benefit from parallelization. Values:
-1
(Default) Treats update queries as normal queries, load balances the number of cores granted to them using the usual algorithms using max_parallelism_level (or max_update_parallelism_level).
0
Uses max_parallelism_level (or max_update_parallelism_level, if set) and lets the load balancing algorithm establish the target number of cores for update queries in the usual way. Protects the cores granted to update queries by excluding them from the core pool used for load balancing. Same as max_update_parallelism_level >= 0.
level
Uses the specified hard-coded parallelism level (must be greater than 0) for update queries that will be granted irrespectively of load balancing. Protects the cores granted to update queries by excluding them from the core pool used for load balancing. Takes precedence over max_parallelism_level or max_update_parallelism_level.