How Much Memory Do I Need?
IMPORTANT! Make sure that X100 resource consumption does not affect the OLTP side of Actian X. The memory used for X100 must be allocated in a way that allows the OLTP system to execute as normal.
It is important to size memory appropriately when using the X100 component in Actian X. Insufficient memory allocation may force you to enable spill-to-disk for various operations, in which case those queries that spill to disk may see significant performance degradation over those that do not.
Your server must have enough memory to support the Operating System (OS), Actian X, and any other applications running on the server. While the OS can deal with insufficient memory by swapping memory to disk, this must be avoided as it will cause a severe slowdown of the system and result in poor and unpredictable performance.
X100 uses two memory pools:
• Execution memory: the execution memory pool should be large enough to support concurrent hash joins, aggregations and sorts against your biggest tables to prevent large queries from spilling to disk. Also, if you plan to define secondary indexes prepare for the unique entries in the indexed data to reside uncompressed in execution memory at all times plus an overhead for every column value.
• Column buffer manager (CBM) memory: a portion of main memory that is allocated to store data (compressed).
These memory pools are assigned per database and cannot be shared between databases. If you plan to run multiple X100 databases concurrently then you must divide the available resources--memory in this case--between the databases that will run concurrently and make sure that the total memory consumption does not exceed the total amount of available RAM. Consider using multiple schemas instead of multiple databases to allow maximum memory resource utilization across multiple applications.
To identify how much memory is needed, identify how much memory you need for every pool (as discussed below), per database if you plan to run multiple X100 databases concurrently, then add at least 2 GB for the OS and memory for any other applications you plan to run on the same server.
To simplify system sizing and to achieve the most predictable performance, you should dedicate your server to run Actian X.
Realistically, an environment to run X100 queries should be configured with at least 32 GB of RAM.
Considering that multiple factors impact memory consumption, including the data size that is frequently accessed, the complexity of the queries, and query concurrency, there is no simple formula to determine the total amount of memory that will work well for your application. For an extremely rough estimate, consider the uncompressed total amount of source data you store divided by 10. For example, for a database that stores 5 TB of source data start by planning for 512 GB of RAM and adjust as necessary. Smaller databases will likely need more memory than one tenth of the source data size.
Sizing Execution Memory
There is no simple formula that determines the amount of execution memory needed for a query. The execution memory should be large enough to support query operations against your biggest tables and must take into account concurrent query execution. In addition, if you plan to create secondary indexes then you must take the memory consumption of these indexes into consideration for sizing execution memory. Unique entries of indexed column values will reside uncompressed in execution memory, and every unique entry in the index has an additional fixed 21-byte overhead for the first occurrence with an 8-byte overhead for every subsequently repeating value.
Table joins, aggregations, and sorts, whether implicit--for example, as a result of a query with the keyword DISTINCT or queries with UNION (not UNION ALL)--or explicit, can use lots of memory when operating against large data sets.
For a large hash join the smallest side of the join should fit in memory. For example, if a query retrieves on average 30 bytes per row for the smallest side of the join and the query retrieves 100 million rows, then to complete a hash join the query will use:
30 * 100,000,000 = 3,000,000,000 bytes = 3 GB
For X100 to correctly identify which side of the join is the smallest (after applying filter criteria) it is important that table statistics are reliable. For more information about collecting and maintaining statistics, see the Database Administrator Guide.
For a large aggregation the entire (intermediate) result set should fit in memory. For example, if a query retrieves 50 bytes per row out of a 1 billion rows table (with no filter criteria), and the aggregation reduces the cardinality by a factor of 20, then the query will use:
50 * 1,000,000,000/20 = 2,500,000,000 bytes = 2.5 GB
To achieve the fastest query response times X100 does not compress data in execution memory.
Multiple queries running concurrently will each consume memory. Consider query concurrency and the workload when determining the total amount of memory required for execution memory. Also, parallel queries generally use more memory than queries that do not run in parallel.
A query will consume as much memory as it can. If a query needs more memory than is available, it will return an out-of-memory error unless you enabled spill-to-disk operations. Spill-to-disk for hash joins and aggregations is disabled by default but can be enabled in the vectorwise.conf parameter file. Spill-to-disk for sorts is always enabled and cannot be turned off. Unless configured otherwise, per database X100 allocates 50% of your physical memory to execution memory by default.
For more information, see
Memory Settings on page 70.
Sizing the Column Buffer Manager (CBM) Memory
The first question to ask when you want to determine how much CBM memory you need is whether you want the data set to be able to fit in memory. I/O performance is very important to achieve maximum query performance and using main memory to store the data (in addition to disk for high availability and recovery reasons) can avoid a potential data storage layer bottleneck.
By default, X100 keeps data compressed in CBM memory. Maximum performance can be achieved when data is stored uncompressed in memory but this will increase the memory size requirement.
If your data set is significantly greater than your planned CBM memory size, then consider the size of the frequently accessed data, which should ideally fit into CBM memory. X100 performs smart buffer management to maximize the throughput across multiple operations when data retrieval from disk is required.
By default, X100 configures 25% of available memory to CBM memory per database.
For more information, see
Memory Settings on page 70.