Data Size Consideration for Some X100 Queries
To get the fastest query response times, X100 by default does not spill intermediate results to disk for joins and aggregations. As a result, using the default settings, X100 may not be able to perform a requested operation for some types of queries due to high memory usage. The following scenarios are the most typical:
• Performing an aggregation that produces a high number of records. For example:
SELECT l_orderkey, COUNT(*)
FROM lineitem
GROUP BY l_orderkey
If the number of distinct values of "l_orderkey" is high (tens of millions or more), the query may fail, depending on available memory.
• Performing a join between two large tables without any restrictions. For example:
SELECT o_orderdate, l_receiptdate
FROM lineitem, orders
WHERE l_orderkey = o_orderkey
If the number of records in "orders" is high (tens of millions or more), the query may fail, depending on available memory.
• Performing a join between two large tables without any restrictions. For example:
SELECT o_orderdate, l_receiptdate
FROM lineitem, orders
WHERE l_orderkey = o_orderkey
If the number of records in "orders" is high (tens of millions or more), the query may fail, depending on available memory.
Possible ways to handle such queries are:
• Increasing available memory.
• Indexing the tables involved in the query allows a faster and more memory-efficient solution to be used. In the first example, index the "lineitem" table on "l_orderkey"; in the second example, index both "lineitem" and "orders" tables on the join key.
• Another option is to let the aggregation or join save its temporary results on disk, allowing it to scale beyond available memory. This technique, known as "spilling to disk," can be enabled by setting the [engine] enable_aggregation_disk_spilling parameter to "true".
Note: Enabling spilling to disk will let the query complete but may cause severe performance degradation, possibly even orders of magnitude slower than when the query is running in memory. This is because disk access is slower than memory access. Using faster disk subsystems can partially alleviate the slower performance.
• For the third example, you can consider using a UNION ALL approach if you know there are no duplicates, or if you accept duplicate entries. Because duplicate-elimination in UNION is a special form of aggregation, the “spilling the disk” technique also applies to UNION.
Last modified date: 04/03/2024