Vector 7.0 | X100 Configuration Parameters

System Administrator Guide > System Administrator Guide > Configuring and Managing the X100 Engine > X100 Configuration Parameters

Was this helpful?

X100 Configuration Parameters

Each configuration parameter belongs in one of these sections: memory, system, server, cbm, or engine.

[memory] Settings

Memory settings in vectorwise.conf affect the memory used for query processing (not for data caching in the buffer pool).

max_memory_size

Specifies the amount (in bytes) of the total memory used for query execution.

Notes:

• Max_memory_size applies to all queries executing across the system, but a single query can consume almost all available memory.

• Some queries will fail if they consume too much memory; increasing available memory can help.

• If max_memory_size is not specified or is not larger than zero, a default setting of 50% of the physical system memory is used.

• Memory size defined with this option does not include bufferpool_size.

For more information, see Memory Settings.

Limits: 96 MB to 256 TB

Default: 0 (use 50% of the physical system memory)

max_overalloc

Specifies the maximum amount of memory that will be overallocated (reserved as virtual memory) for zero-cost growing of certain memory areas. The setting needs to be changed only in rare cases.

A value of 0 means no overallocation. A value of ‑1 means overallocate with the amount of total system memory ("unlimited").

Note: Overallocation should be disabled on operating systems where: /proc/sys/vm/overcommit_memory is not 1, or "ulimit -v" is not unlimited.

Default: 2G or set to 0 upon installation if the system does not allow overallocation.

huge_tlb_lim

Enables the huge pages feature of the CPU only for queries using the number of cores less than or equal to the specified value. Set this option to 1 if parallel queries do not benefit from the huge pages feature. This may be experienced on some operating systems, such as Red Hat Enterprise Linux 6, which has an automatic anonymous TLB feature.

Special value 0 means there is no limit.

Default: 0 (no limit)

use_huge_tlb

Specifies whether to use the huge pages feature (see Using Large Pages) of the CPU. Valid values are true and false.

Default: false

use_numa

Specifies whether to use NUMA optimization (see Using NUMA Optimization). Valid values are true and false.

Default: true

[system] Settings

System settings in vectorwise.conf affect the X100 system.

full_rewrite_on_combine

Specifies whether all blocks are to be rewritten by the COMBINE command. Setting this option to true can be used to avoid or undo block fragmentation introduced by partial update propagation.

Default: false

full_rewrite_on_update_propagation

Specifies whether all blocks are to be rewritten by the system-triggered propagation process. Setting this option to true can be used to avoid block fragmentation introduced by partial update propagation.

Default: false

max_old_log_size

Specifies the maximum size for the main_wal_backups directory, which holds old main.wal files.

Default: 2 GB

Note: The system will exceed the default value if it is necessary to keep a single old main.wal file that exceeds 2 GB.

num_cores

Specifies the number of processing units in the system. This value is used to calculate the amount of memory available for each transaction.

Num_cores is used to calculate the maximum parallelism level for a newly issued query if there are many queries already being executed. The goal is not to deteriorate the total throughput of the system. Generally, the higher the num_cores value, the higher parallelism levels are granted for queries running concurrently (where each parallelism level is not higher than max_parallelism_level).

Num_cores is used to derive a database-wide target on the number of concurrent threads used for parallel execution. Queries will run at a lower degree of parallelism if the default or requested degree would lead to an uneven allocation of resources to the new query based on the number of concurrently executing queries. On a system running many concurrent queries, new queries may get only one core and will more likely run out of memory since the num_cores value is used to reserve memory for possible other threads.

Default: Number of processors visible to the operating system.

resolve_concurrent_updates

Tells X100 whether to attempt to merge concurrent updates on the same table. If set to false, the transaction that tries to commit last will fail and will have to abort. Disabling merging of concurrent updates can reduce memory consumption, especially if there are any long-running transactions in the system.

Default: true

system udf_conf_path

Specifies the location of the udf_engines.conf file when configuring the user defined functions.

use_sse42

Uses SSE4.2 for accelerated string processing on CPU architectures that support it.

Default: YES

PDT Parameters

The Positional Delta Tree (PDT) is a data structure used for batch updates residing in memory.

ioup_blocks_filled_trigger

Specifies an upper limit on a table's PDT memory consumption and is calculated as follows:

block_size * ioup_blocks_filled_trigger * number of columns of the table

An insert-only update propagation triggers when the memory consumption of PDT tuple inserts exceeds this threshold.

Default: 4

Notes:

• It does not override the min_propagate_table_count setting.

• The update propagation is not solely dependent on this setting and can be triggered even if only the max_table_update_ratio is met.

• The update propagation is triggered only when the update_propagation setting is True.

ioup_exclusive_lock_timeout_ms

Specifies the lock mode and timeout duration for an insert-only update propagation (IOUP) transaction. It can either be an exclusive read-write (ERW) lock or a shared lock (typically used for read-write transactions).

An ERW lock allows IOUP to run exclusively as the sole RW transaction, apart from the concurrent read-only transactions. On acquiring an ERW lock, no concurrent read-write transactions are allowed to start. In case of ongoing RW transactions, IOUP has to wait for them to commit or abort. If an IOUP transaction does not acquire an ERW lock within a specific duration (in milliseconds), the transaction fails.

You can specify either of the following values for this parameter:

• 0 - Do not try to acquire an ERW lock, but acquire a shared lock. It is the default value and indicates that the ERW lock feature is disabled for the IOUP transaction.

• > 0 - Try to acquire an exclusive lock, but timeout after the specified milliseconds

• < 0 - Try to acquire an exclusive lock indefinitely. Avoid this option.

Default: 0

Notes:

• The update propagation is triggered only when the update_propagation setting is True.

• In case of a shared lock, you may encounter concurrency issues. IOUP running in the background in shared mode can occasionally cause commit failure for a non-concurrent workload.

max_global_update_memory

Specifies an upper limit on the total update memory consumption as a percentage of query memory limit (max_memory_size). When this limit is reached, the system propagates in-memory updates to disk.

Update memory is a component of query memory.

Limits: 0.01 to 1.0

Default: 0.25 (one quarter of max_memory_size)

max_number_propagate_retries

Specifies the maximum number of retries for failed automatic update propagation. When propagation for a certain table fails, X100 retries propagating this table if no alternative tables are possible.

Default: 6

max_table_update_memory

Specifies an upper limit on the total update memory consumption per table as a percentage of global in-memory update limit (In case of a shared lock, you may encounter concurrency issues. IOUP running in the background in shared mode can occasionally cause commit failure for a non-concurrent workload.).

When the amount of memory used by a table's PDT updates reaches the upper limit calculated as follows:

max_table_update_memory * max_global_update_memory * max_memory_size

the system propagates in-memory updates to disk. The propagation happens regardless of min_propagate_table_count, max_table_update_ratio or ioup_blocks_filled_trigger parameters.

Default: 0.1

max_table_update_ratio

Specifies a per-table maximum percentage of in-memory updates (that is, the number of updates divided by the number of stable tuples). When this limit is reached, the system propagates to disk in-memory updates for this table.

Limits: 0.01 to 1.0

Default: 0.05

max_update_memory_per_transaction

Specifies an upper limit on the total update memory in the snapshot layer of a transaction, as a percentage of query memory limit (max_memory_size). When this limit is reached, the system will not allow further update operations in this transaction.

Limits: 0.01 to 1.0

Default: 0.25 (one quarter of max_memory_size)

min_propagate_table_count

Specifies the minimum number of tuples in a table above which max_table_update_ratio is checked. This is to avoid frequent update propagation to disk on relatively small tables.

Default: 500K

shutdown_max_maintenance_window_ms

Specifies the time duration (milliseconds) to execute update propagation during system shutdown. On reaching the timeout, all ongoing update-propagation tasks are terminated, and the shutdown procedure starts.

You can set the value of this parameter to 0 to disable shutdown maintenance. For negative values such as -1, the system does not apply a timeout. It should be avoided as it leads to all tables with PDT updates getting propagated, which can be time-consuming.

Default: 15000

update_propagation

Enables or disables system-triggered propagation of in-memory updates. Valid values are true and false.

Note: Disabling automatic propagation overrides the max_global_update_memory setting and can lead to unwanted consequences if query memory becomes full with PDTs.

Default: true

update_propagation_retry_delay_base

Specifies a per-table multiplier for delays of subsequent failed update propagation retries calculated as follows:

update_propagation_retry_delay_sec * (update_propagation_retry_delay_base ^ num_failures)

Default: 2

update_propagation_retry_delay_sec

Specifies a per-table initial delay (that is, the delay after the first update propagation failure), in seconds, before retrying a failed update propagation. For each subsequent retry, the delay timer increases and is calculated as follows:

current_delay * system/update_propagation_retry_delay_base

When an update propagation against the failed table succeeds, the retry delay timer is reset to the value of the parameter set by the user. You can also set the value to 0 to disable the retry delay mechanism, in which case the system keeps retrying immediately after each update propagation failure.

Default: 60

[server] Settings

Server settings in vectorwise.conf affect the X100 server.

max_profiles_per_database

Specifies the number of profiles to keep in memory for a database. If exceeded, oldest profiles are dropped.

Default: 1

max_profiles_per_session

Specifies the number of profiles to keep in memory for a session.

Default: 1

port

Specifies the TCP/IP port on which the X100 server will listen for connections.

If zero or omitted, the server will dynamically assign a port when it first starts up. A nonzero value should only be defined in a database specific vectorwise.dbname.conf; it must not be defined in the installation-wide vectorwise.conf.

A fixed nonzero port number can be useful in situations where remote direct connections are needed, such as from the vwload utility, and there is a firewall between the client and server. In such cases, the fixed port can be made known to the firewall to allow vwload to connect to the X100 server. The system administrator should choose a port that does not conflict with other listen ports on the network.

Default: 0

profile_per_query

Generates a profile file for each query. The profiles will be saved in the directory dbname/CBM/profiles in timestamp-named files. Valid values: true and false.

Default: false

profile_per_query_dir

Specifies the directory to which profile files are written when using profile_per_query = true.

The installation owner must have write permission to the directory.

Default: empty

profile_maxfiles

Specifies the maximum number of profile files retained when using profile_per_query = true. When this threshold is exceeded, the profile files are deleted (starting from the oldest ones) until the limit is met.

Default: 1000

profile_maxsize

Specifies the maximum total size (in bytes) of profile files retained when using profile_per_query = true. When this threshold is exceeded, the profile files are deleted (starting from the oldest ones) until the limit is met.

Default: 100M

profiling

Collects and prints profiling information. Setting this parameter to false may give the highest performance on short-running queries.

Default: true

udfcache

Specifies the path to the local file system cache used for placing files required by Microservice UDFs. It is mounted to the microservice container.

DEFAULT: /tmp/vw_udfcache

vwlog_rotation_use_size_threshold (bool):

If true, the log file will get rotated automatically after surpassing a size limit that can be specified by setting the below property.

DEFAULT: false

vwlog_rotation_size_threshold:

Must be a value between 4M and 4G.

DEFAULT: 200M

vwlog_rotation_use_time_threshold (bool):

If true, the log file will get rotated automatically after a fixed duration. This can be specified by setting the below property.

DEFAULT: false

vwlog_rotation_time_interval:

Must be iso-8601 format: “P[#days]DT[#hours]H[#minutes]M[#seconds]S”

'P' stands for 'period', i.e., days, weeks, months, and years; 'T' stand for 'time', i.e., hours, minutes, and seconds.

Note that 'D' (days) is the largest valid input measure, 'Y' (years), 'M' (months), and 'W' (weeks) are not supported. If 'days' is the smallest unit that is used, then the letter 'T' is optional (e.g. P1D (1 day) is the same as P1DT). However, it is mandatory that the string should begin with a 'P'. Further examples for valid intervals are:

PT1H (1 hour), PT12H30M, P2D4H30M15S, …

DEFAULT: P1D

Must be between PT30M and P180D.

vwlog_rotation_use_auto_compression (bool):

Enable or disable the automatic compression of rotated log files. If enabled, the old log file will get compressed to .gz format.

DEFAULT: false

[cbm] Settings

CBM (Column Buffer Manager) settings in vectorwise.conf affect the buffer manager for X100.

CBM parameters determine the format of the database files when the database is created. If you want the per-database configuration file to be applied during database creation, the file must be created before using the createdb command. For more information, see X100 Configuration File (vectorwise.conf).

Note especially these CBM parameters:

• block_size (see block_size)

• bufferpool_size (see bufferpool_size)

• group_size (see group_size)

block_size

Specifies the minimum I/O granularity.

This option is the most important for I/O performance.

Note: This setting cannot be changed after database creation!

Note: The system rounds up this setting to the closest power of 2.

Default: 512K

bufferpool_size

Specifies the buffer pool size in bytes (that is, disk cache). The setting can be changed without reloading data.

Notes:

• Increasing bufferpool_size makes more data reside in memory and may reduce I/O.

• If bufferpool_size is not specified or is not larger than zero, a default setting of 25% of the physical system memory is used.

• Memory size defined with this option does not include max_memory_size.

For more information, see Memory Settings.

Default: 0 (use 25% of the physical system memory)

cleanup_unused_files_on_startup

Controls the automatic cleanup of unused files on startup of the X100 server. Values are:

• Unspecified or empty (Default) - No cleanup is performed on startup

• Foreground - Cleans up files in blocking fashion on startup, before the server can be used

• Background - Cleans up files in the background, concurrently with the server getting ready for use

cleanup_unused_files_default_mode

Defines the actions to perform on unused files that are cleaned up automatically on the start of the X100 server. Values are:

• Rename (Default) - Prefixes the unused files with the timestamp of the cleanup invocation and moves them to the backup directory, where they are retained for a configured period before expiring. See the unused_files_retention_period configuration parameter for details.

• Delete - Deletes the unused files

• Delete_empty - Deletes unused files only if the file is empty (0 bytes)

compression_lz4_enabled

Enables or disables compression of string data using the LZ4 method. Valid values are true and false.

Setting this parameter to true may give better performance on IO-constrained systems.

Default: false

gather_io_enabled

Tells the X100 server whether to check if the operating system supports gather I/O and to use asynchronous gather I/O if it does. Valid values are true and false.

Default: true

group_size

Specifies the number of blocks that are grouped together on disk to improve data locality for sequential scans.

Note: The system rounds up this setting to the closest power of 2.

Default: 8

minmax_maxsize

Specifies the granularity of the (automatically created) min-max indexes on all columns. Min-max indexes are used automatically to derive scan ranges based on selection predicates on columns that correlate to tuple order.

The value of this parameter influences the precision of the range restrictions applied when performing a table scan. With a default setting (1024), up to ~0.1% of a table might be scanned extra for each found scan range. Increasing this value can improve precision, but may slightly decrease the update performance and increase the memory usage. We recommend increasing this parameter only if you issue queries that scan small fractions of a table.

This parameter should be set before database creation.

Default: 1024

pbm_enabled

Enables or disables the Predictive Buffer Manager, which improves performance in workloads that are I/O bound. Valid values are true and false. Setting the parameter to false switches X100 back to using the Least Recently Used (LRU) algorithm.

PBM improves performance in most cases. On rare occasions, however, the LRU algorithm can be faster because it is simpler and incurs less overhead, especially in scenarios where data is fully memory-resident (no blocks are loaded or swapped out during query execution).

Default: true

unused_files_retention_period

Defines the retention period of unused files in the backup directory. Unused files older than the retention period are automatically deleted from the directory.

[engine] Settings

Engine settings in vectorwise.conf affect the query execution engine.

enable_aggregation_disk_spilling

Enables or disables spilling to disk for hash aggregations. Enabling this option prevents out-of-memory conditions from occurring when aggregate results are too large to fit into memory. Valid values are true and false.

Default: false

enable_hashjoin_disk_spilling

Enables or disables spilling to disk for hash joins. Enabling this option prevents out-of-memory conditions from occurring when the hash table built from one of the join inputs is too large to fit into memory. Valid values are true and false.

Default: false

enable_reuse_disk_spilling

Enables or disables spilling to disk for reused repeating query plan parts. Enabling this option prevents out-of-memory conditions from occurring when the buffered output of a reused query part is too large to fit into memory. Valid values are true and false.

Default: false

listagg_group_maxlen

Specifies the maximum result length of the LISTAGG() aggregate function in ASCII bytes.

Limits: 1 to 32000

Default: 4000

max_parallelism_level

Defines the maximum number of threads used by a single query. This is a hard limit.

A value of 1 disables parallel query execution. A value larger than 1 enables parallel query execution.

Max_parallelism_level is related to the num_cores parameter (see num_cores). The execution engine tries not to use more than the max_parallelism_level number of cores at a time for any query.

Limits: 0 to 256

Default: Number of processors on the machine or 8 if the machine has more than 8 processors.

max_update_parallelism_level

Defines the maximum number of threads used by update (DML) queries. This setting can be used to improve performance of update queries that can benefit from parallelization. Special values:

-1

Treats update queries as normal queries, load balances the number of cores granted to them using the usual algorithms using max_parallelism_level.

level

Overrides max_parallelism_level with the specified level value. Protects the cores granted to update queries by excluding them from the core pool used for load balancing.

If the value is 0, uses num_cores as the level (like in max_parallelism_level).

A value of 1 disables parallel query execution for updates. A value larger than 1 enables parallel query execution.

update_parallelism_level

Defines the forced level of parallelism used by update (DML) queries. This setting can be used to improve performance of update queries that can benefit from parallelization. Values:

-1

(Default) Treats update queries as normal queries, load balances the number of cores granted to them using the usual algorithms using max_parallelism_level (or max_update_parallelism_level).

Uses max_parallelism_level (or max_update_parallelism_level, if set) and lets the load balancing algorithm establish the target number of cores for update queries in the usual way. Protects the cores granted to update queries by excluding them from the core pool used for load balancing. Same as max_update_parallelism_level >= 0.

level

Uses the specified hard-coded parallelism level (must be greater than 0) for update queries that will be granted irrespectively of load balancing. Protects the cores granted to update queries by excluding them from the core pool used for load balancing. Takes precedence over max_parallelism_level or max_update_parallelism_level.

Last modified date: 12/19/2024