Vector 6.2 | X100 Configuration Parameters

System Administrator Guide > System Administrator Guide > Configuring and Managing the X100 Engine > X100 Configuration Parameters

Was this helpful?

X100 Configuration Parameters

Each configuration parameter belongs in one of these sections: memory, system, server, cbm, or engine.

[memory] Settings

Memory settings in vectorwise.conf affect the memory used for query processing (not for data caching in the buffer pool).

max_memory_size

Specifies the amount (in bytes) of the total memory used for query execution.

Notes:

• Max_memory_size applies to all queries executing across the system, but a single query can consume almost all available memory.

• Some queries will fail if they consume too much memory; increasing available memory can help.

• If max_memory_size is not specified or is not larger than zero, a default setting of 50% of the physical system memory is used.

• Memory size defined with this option does not include bufferpool_size.

For more information, see Memory Settings.

Limits: 96 MB to 256 TB

Default: 0 (use 50% of the physical system memory)

max_overalloc

Specifies the maximum amount of memory that will be overallocated (reserved as virtual memory) for zero-cost growing of certain memory areas. The setting needs to be changed only in rare cases.

A value of 0 means no overallocation. A value of ‑1 means overallocate with the amount of total system memory ("unlimited").

Note: Overallocation should be disabled on operating systems where: /proc/sys/vm/overcommit_memory is not 1, or "ulimit -v" is not unlimited.

Default: 2G or set to 0 upon installation if the system does not allow overallocation.

huge_tlb_lim

Enables the huge pages feature of the CPU only for queries using the number of cores less than or equal to the specified value. Set this option to 1 if parallel queries do not benefit from the huge pages feature. This may be experienced on some operating systems, such as Red Hat Enterprise Linux 6, which has an automatic anonymous TLB feature.

Special value 0 means there is no limit.

Default: 0 (no limit)

use_huge_tlb

Specifies whether to use the huge pages feature (see Using Large Pages) of the CPU. Valid values are true and false.

Default: false

use_numa

Specifies whether to use NUMA optimization (see Using NUMA Optimization). Valid values are true and false.

Default: true

[system] Settings

System settings in vectorwise.conf affect the X100 system.

full_rewrite_on_combine

Specifies whether all blocks are to be rewritten by the COMBINE command. Setting this option to true can be used to avoid or undo block fragmentation introduced by partial update propagation.

Default: false

full_rewrite_on_update_propagation

Specifies whether all blocks are to be rewritten by the system-triggered propagation process. Setting this option to true can be used to avoid block fragmentation introduced by partial update propagation.

Default: false

max_old_log_size

Specifies the maximum size for the main_wal_backups directory, which holds old main.wal files.

Default: 2 GB

Note: The system will exceed the default value if it is necessary to keep a single old main.wal file that exceeds 2 GB.

num_cores

Specifies the number of processing units in the system. This value is used to calculate the amount of memory available for each transaction.

Num_cores is used to calculate the maximum parallelism level for a newly issued query if there are many queries already being executed. The goal is not to deteriorate the total throughput of the system. Generally, the higher the num_cores value, the higher parallelism levels are granted for queries running concurrently (where each parallelism level is not higher than max_parallelism_level).

Num_cores is used to derive a database-wide target on the number of concurrent threads used for parallel execution. Queries will run at a lower degree of parallelism if the default or requested degree would lead to an uneven allocation of resources to the new query based on the number of concurrently executing queries. On a system running many concurrent queries, new queries may get only one core and will more likely run out of memory since the num_cores value is used to reserve memory for possible other threads.

Default: Number of processors visible to the operating system.

resolve_concurrent_updates

Tells X100 whether to attempt to merge concurrent updates on the same table. If set to false, the transaction that tries to commit last will fail and will have to abort. Disabling merging of concurrent updates can reduce memory consumption, especially if there are any long-running transactions in the system.

Default: true

system udf_conf_path

Specifies the location of the udf_engines.conf file when configuring the user defined functions.

use_sse42

Uses SSE4.2 for accelerated string processing on CPU architectures that support it.

Default: YES

PDT Parameters

The Positional Delta Tree (PDT) is a data structure used for batch updates residing in memory.

max_global_update_memory

Specifies an upper limit on total update memory consumption as a percentage of query memory limit (max_memory_size). When this limit is reached, the system propagates in-memory updates to disk.

Update memory is a component of query memory.

Limits: 0.01 to 1.0

Default: 0.25 (one quarter of max_memory_size)

max_number_propagate_retries

Specifies the maximum number of retries for failed automatic update propagation. When propagation for a certain table fails, X100 will retry propagating this table if no alternative tables are possible.

Default: 3

max_table_update_ratio

Specifies a per-table maximum percentage of in-memory updates (that is, the number of updates divided by the number of stable tuples). When this limit is reached, the system propagates to disk in-memory updates for this table.

Limits: 0.01 to 1.0

Default: 0.05

max_update_memory_per_transaction

Specifies an upper limit on the total update memory in the snapshot layer of a transaction, as a percentage of query memory limit (max_memory_size). When this limit is reached, the system will not allow further update operations in this transaction.

Limits: 0.01 to 1.0

Default: 0.25 (one quarter of max_memory_size)

min_propagate_table_count

Specifies the minimum number of tuples in a table above which max_table_update_ratio is checked. This is to avoid frequent update propagation to disk on relatively small tables.

Default: 500K

update_propagation

Enables or disables system-triggered propagation of in-memory updates. Valid values are true and false.

Note: Disabling automatic propagation overrides the max_global_update_memory setting and can lead to unwanted consequences if query memory becomes full with PDTs.

Default: true

[server] Settings

Server settings in vectorwise.conf affect the X100 server.

max_profiles_per_database

Specifies the number of profiles to keep in memory for a database. If exceeded, oldest profiles are dropped.

Default: 1

max_profiles_per_session

Specifies the number of profiles to keep in memory for a session.

Default: 1

port

Specifies the TCP/IP port on which the X100 server will listen for connections.

If zero or omitted, the server will dynamically assign a port when it first starts up. A nonzero value should only be defined in a database specific vectorwise.dbname.conf; it must not be defined in the installation-wide vectorwise.conf.

A fixed nonzero port number can be useful in situations where remote direct connections are needed, such as from the vwload utility, and there is a firewall between the client and server. In such cases, the fixed port can be made known to the firewall to allow vwload to connect to the X100 server. The system administrator should choose a port that does not conflict with other listen ports on the network.

Default: 0

profile_per_query

Generates a profile file for each query. The profiles will be saved in the directory dbname/CBM/profiles in timestamp-named files. Valid values: true and false.

Default: false

profile_per_query_dir

Specifies the directory to which profile files are written when using profile_per_query = true.

The installation owner must have write permission to the directory.

Default: empty

profile_maxfiles

Specifies the maximum number of profile files retained when using profile_per_query = true. When this threshold is exceeded, the profile files are deleted (starting from the oldest ones) until the limit is met.

Default: 1000

profile_maxsize

Specifies the maximum total size (in bytes) of profile files retained when using profile_per_query = true. When this threshold is exceeded, the profile files are deleted (starting from the oldest ones) until the limit is met.

Default: 100M

profiling

Collects and prints profiling information. Setting this parameter to false may give the highest performance on short-running queries.

Default: true

[cbm] Settings

CBM (Column Buffer Manager) settings in vectorwise.conf affect the buffer manager for X100.

CBM parameters determine the format of the database files when the database is created. If you want the per-database configuration file to be applied during database creation, the file must be created before using the createdb command. For more information, see X100 Configuration File (vectorwise.conf).

Note especially these CBM parameters:

• block_size (see block_size)

• bufferpool_size (see bufferpool_size)

• group_size (see group_size)

block_size

Specifies the minimum I/O granularity.

This option is the most important for I/O performance.

Note: This setting cannot be changed after database creation!

Note: The system rounds up this setting to the closest power of 2.

Default: 512K

bufferpool_size

Specifies the buffer pool size in bytes (that is, disk cache). The setting can be changed without reloading data.

Notes:

• Increasing bufferpool_size makes more data reside in memory and may reduce I/O.

• If bufferpool_size is not specified or is not larger than zero, a default setting of 25% of the physical system memory is used.

• Memory size defined with this option does not include max_memory_size.

For more information, see Memory Settings.

Default: 0 (use 25% of the physical system memory)

compression_lz4_enabled

Enables or disables compression of string data using the LZ4 method. Valid values are true and false.

Setting this parameter to true may give better performance on IO-constrained systems.

Default: false

gather_io_enabled

Tells the X100 server whether to check if the operating system supports gather I/O and to use asynchronous gather I/O if it does. Valid values are true and false.

Default: true

group_size

Specifies the number of blocks that are grouped together on disk to improve data locality for sequential scans.

Note: The system rounds up this setting to the closest power of 2.

Default: 8

minmax_maxsize

Specifies the granularity of the (automatically created) min-max indexes on all columns. Min-max indexes are used automatically to derive scan ranges based on selection predicates on columns that correlate to tuple order.

The value of this parameter influences the precision of the range restrictions applied when performing a table scan. With a default setting (1024), up to ~0.1% of a table might be scanned extra for each found scan range. Increasing this value can improve precision, but may slightly decrease the update performance and increase the memory usage. We recommend increasing this parameter only if you issue queries that scan small fractions of a table.

This parameter should be set before database creation.

Default: 1024

pbm_enabled

Enables or disables the Predictive Buffer Manager, which improves performance in workloads that are I/O bound. Valid values are true and false. Setting the parameter to false switches X100 back to using the Least Recently Used (LRU) algorithm.

PBM improves performance in most cases. On rare occasions, however, the LRU algorithm can be faster because it is simpler and incurs less overhead, especially in scenarios where data is fully memory-resident (no blocks are loaded or swapped out during query execution).

Default: true

[engine] Settings

Engine settings in vectorwise.conf affect the query execution engine.

enable_aggregation_disk_spilling

Enables or disables spilling to disk for hash aggregations. Enabling this option prevents out-of-memory conditions from occurring when aggregate results are too large to fit into memory. Valid values are true and false.

Default: false

enable_hashjoin_disk_spilling

Enables or disables spilling to disk for hash joins. Enabling this option prevents out-of-memory conditions from occurring when the hash table built from one of the join inputs is too large to fit into memory. Valid values are true and false.

Default: false

enable_reuse_disk_spilling

Enables or disables spilling to disk for reused repeating query plan parts. Enabling this option prevents out-of-memory conditions from occurring when the buffered output of a reused query part is too large to fit into memory. Valid values are true and false.

Default: false

listagg_group_maxlen

Specifies the maximum result length of the LISTAGG() aggregate function in ASCII bytes.

Limits: 1 to 32000

Default: 4000

max_parallelism_level

Defines the maximum number of threads used by a single query. This is a hard limit.

A value of 1 disables parallel query execution. A value larger than 1 enables parallel query execution.

Max_parallelism_level is related to the num_cores parameter (see num_cores). The execution engine tries not to use more than the max_parallelism_level number of cores at a time for any query.

Limits: 0 to 256

Default: Number of processors on the machine or 8 if the machine has more than 8 processors.

max_update_parallelism_level

Defines the maximum number of threads used by update (DML) queries. This setting can be used to improve performance of update queries that can benefit from parallelization. Special values:

-1

Treats update queries as normal queries, load balances the number of cores granted to them using the usual algorithms using max_parallelism_level.

level

Overrides max_parallelism_level with the specified level value. Protects the cores granted to update queries by excluding them from the core pool used for load balancing.

If the value is 0, uses num_cores as the level (like in max_parallelism_level).

A value of 1 disables parallel query execution for updates. A value larger than 1 enables parallel query execution.

update_parallelism_level

Defines the forced level of parallelism used by update (DML) queries. This setting can be used to improve performance of update queries that can benefit from parallelization. Values:

-1

(Default) Treats update queries as normal queries, load balances the number of cores granted to them using the usual algorithms using max_parallelism_level (or max_update_parallelism_level).

Uses max_parallelism_level (or max_update_parallelism_level, if set) and lets the load balancing algorithm establish the target number of cores for update queries in the usual way. Protects the cores granted to update queries by excluding them from the core pool used for load balancing. Same as max_update_parallelism_level >= 0.

level

Uses the specified hard-coded parallelism level (must be greater than 0) for update queries that will be granted irrespectively of load balancing. Protects the cores granted to update queries by excluding them from the core pool used for load balancing. Takes precedence over max_parallelism_level or max_update_parallelism_level.

Last modified date: 06/28/2024