Sizing Execution Memory

There is no simple formula that determines the amount of execution memory needed for a query. The execution memory should be large enough to support query operations against your biggest tables and must take into account concurrent query execution. In addition, if you plan to create secondary indexes then you must take the memory consumption of these indexes into consideration for sizing execution memory. Unique entries of indexed column values will reside uncompressed in execution memory, and every unique entry in the index has an additional fixed 21-byte overhead for the first occurrence with an 8-byte overhead for every subsequently repeating value.

Table joins, aggregations, and sorts, whether implicit--for example, as a result of a query with the keyword DISTINCT or queries with UNION (not UNION ALL)--or explicit, can use lots of memory when operating against large data sets.

For a large hash join the smallest side of the join should fit in memory. For example, if a query retrieves on average 30 bytes per row for the smallest side of the join and the query retrieves 100 million rows, then to complete a hash join the query will use:

For Vector to correctly identify which side of the join is the smallest (after applying filter criteria) it is important that table statistics are reliable. For more information about collecting and maintaining statistics, see the Vector User Guide.

For a large aggregation the entire (intermediate) result set should fit in memory. For example, if a query retrieves 50 bytes per row out of a 1 billion rows table (with no filter criteria), and the aggregation reduces the cardinality by a factor of 20, then the query will use:

To achieve the fastest query response times Vector does not compress data in execution memory.

Multiple queries running concurrently will each consume memory. Take into account query concurrency and the workload when determining the total amount of memory required for execution memory. Also, parallel queries generally use more memory than queries that do not run in parallel.

A query will consume as much memory as it can. If a query needs more memory than is available, it will return an out-of-memory error unless you enabled spill to disk operations. Spill to disk for hash joins and aggregations is disabled by default but can be enabled in the vectorwise.conf parameter file. Spill to disk for sorts is always enabled and cannot be turned off. Unless configured otherwise, per database Vector allocates 50% of your physical memory to execution memory by default.