Partitioned Tables
A table can be partitioned to increase performance and availability. Partitioning distributes the rows of a table among a number of sub-tables (partitions). A partitioning scheme determines which rows are sent to which partitions.
After the partitioning scheme is defined, partitioning is managed automatically.
To define a table with partitions, use the PARTITION= option in the CREATE TABLE WITH clause.
When creating tables, NOPARTITION is the default.
Partitioned tables have the following restrictions:
• Only one dimension can be specified.
• The distribution type must be HASH or AUTOMATIC.
• Only one location per partition is allowed.
Partitioning Schemes
Each dimension of a partitioning scheme defines a rule (distribution scheme) for assigning rows to partitions. Conceptually, a dimension defines a set of logical partitions. Only one dimension is allowed.
Two distribution types are available: HASH. Hash is data-dependent and requires the ON clause.
• HASH distributes row evenly among the partitions by use of a hash value (instead of randomly). Given a value for the partitioning columns, a query can predict which partition contains the rows that have the matching value. Thus a query can restrict its search to a subset of partitions. HASH is data dependent and requires the ON clause.
The optional logical partition names must be unique for each table. The same partition name is allowed to occur in other partitioned tables. If a partition name is omitted, the system generates a name (of the form iipartnn).
If NO LOCATION= is specified for a partition, the location list is defined by the enclosing statement's with_clause (that is, the with_clause that contains the PARTITION= clause).
The distribution scheme can be a default value set at the system level. For more information, see
Default Distribution Scheme.
Partitioning Syntax
A table partition definition has the following format:
PARTITION = (dimension)
The syntax for each partition dimension is:
dimension = rule partitionspec {, partitionspec}
rule
Defines the type of distribution scheme for assigning rows to partitions. Valid values are:
HASH ON column {, column}
Distributes rows evenly among the partitions according to a hash value.
ON column {, column} specifies the columns to partition the table on.
AUTOMATIC
Distributes rows randomly among the partitions.
partitionspec
Defines the number of partitions and optionally their names:
partitionspec = [nn] PARTITION[S] [ ( name {, name} ) ] [with_clause]
where:
nn
Is the number of partitions, which defaults to 1 if omitted.
name
Identifies the partition. When the number of partitions is two or more, a comma-separated list of names can be provided to override the default value.
Default: iipartNN
with_clause
Specifies WITH clause options for partitioning.
The with_clause for partitioning has the following format:
WITH with-option
where
with-option = LOCATION = ( location )
When specifying a LOCATION for the table, this location will only be used for PARTITIONS that are lacking a WITH LOCATION clause.
Guidelines for Partitioned Tables
Follow these guidelines when creating a partitioned table:
• The locations for partitions should be different to exploit the I/O performance of different devices.
• The partition key should be chosen from columns that have uniform values, for example, primary/foreign keys.
• When you expect to have a large number of queries that join tables A and B on the condition A.fk_col = B.col, good partitioning keys for A and B are fk_col and col, respectively.
• When you expect to join tables A and B often, the locations specified for tables A and B should be the same.
A query on a partitioned table with X partitions will use at least X threads (even if the max_parallelism_level in the configuration file or query statement is lower). This may influence throughput performance when many queries on partitioned tables with a large number of partitions are run concurrently.
As a consequence of the previous limitation, the number of partitions for a given table should not be larger than the number of cores available. Choose the number of partitions for any table as a divisor of the number of cores available. More specifically, if you expect most often to have X concurrent queries on table A, then the number of partitions for table A should be calculated as:
#partitions = (no_cores_available / X)
Examples:
– On a 12-core system you expect to have on average 2 concurrent queries on table A: a good number of partitions for A is 6.
– On a 12-core system you will run only 1 query at a time: you can use 12 partitions for each table you define.
• Creating an index on the columns that are used to define a foreign relationship is not allowed when the tables joined by the foreign relationship do not have the same number of partitions or are not partitioned on the columns (or a matching subset) used for the foreign key relationship. For example:
The following is allowed:
CREATE TABLE X (a i4 NOT NULL,
b i4 NOT NULL,
c i4 NOT NULL)
WITH LOCATION=(db_loc1),
PARTITION=(HASH ON a,c 2 PARTITIONS);
ALTER TABLE X ADD CONSTRAINT pk_x PRIMARY KEY (a,c);
CREATE TABLE Y (c i4 NOT NULL,
d i4 NOT NULL,
e i4)
WITH PARTITION=(HASH ON d,e 2 PARTITIONS WITH LOCATION=(db_loc1));
ALTER TABLE Y ADD CONSTRAINT fk_y FOREIGN KEY(d,e) REFERENCES X(a,c);
CREATE INDEX idx_y ON Y(d,e);
Partitioning keys on c for X and e for Y is also valid.
The following is not allowed:
CREATE TABLE X (a i4 not null,
b i4 not null)
WITH
PARTITION=(HASH on a 2 PARTITIONS,
1 PARTITION WITH LOCATION=(db_loc2),
1 PARTITION);
ALTER TABLE X ADD CONSTRAINT pk_x PRIMARY KEY (a);
CREATE TABLE Y (c i4 NOT NULL,
d i4 NOT NULL)
WITH PARTITION=(HASH ON d 2 PARTITIONS WITH LOCATION=(db_loc1));
ALTER TABLE Y ADD CONSTRAINT fk_y FOREIGN KEY(d) REFERENCES X(a);
CREATE INDEX idx_y ON Y (d);
Automatic Partitioning
The AUTOMATIC partitioning scheme randomly distributes rows evenly among the partitions. Unlike hash-distributed tables, rows with equal values are not guaranteed to be assigned to the same partition. As a result, the system typically must reorganize the data before resolving the query. For example, joining two automatically partitioned tables usually requires reshuffling the rows. This extra step can degrade query performance.
AUTOMATIC partitioning should be used in the following cases:
• When insight is lacking for creating a good hash key. That is, when:
– There is no obvious joining key
– There is no good candidate for hash distributing the table (arbitrary data)
– The table does not share a common join key with other tables
– The table is a temporary staging table
• When defining an automatic partitioned table as a preparatory step before creating a good hash key, you can use GENERATE STATISTICS, and then SELECT to get the minimum and maximum column values and COUNTs to better choose which columns to use as the HASH distribution key.
Automatic Partitioning Limitations
Automatic partitioning has the following limitations:
• AS UNIQUE and FOREIGN KEY constraint checks cannot be done locally for concurrent transactions with update operations on the same table. The first committed transaction wins, while the second triggers a constraint violation (when both contain update operations). The second transaction must be reissued.
• A partial MODIFY TO COMBINE operation may be slower due to extra network traffic.
• A full MODIFY TO COMBINE operation can redistribute tuples in the cluster.