Partitioned Tables
Partitioning is an essential strategy in VectorH.
When the configuration parameter partition_spec_required=vector is set in config.dat, the system requires you to specify either the WITH PARTITION=(...) or WITH NOPARTITION clause when creating a Vector table using CREATE TABLE or CREATE TABLE AS SELECT syntax. This configuration setting is the default for new installations.
Important! In a query involving two or more tables, at least one table must be partitioned.
Partitioning distributes the rows of a table among sub-tables (partitions). A partitioning scheme determines which rows are sent to which partitions.
After the partitioning scheme is defined, partitioning is managed automatically.
To define a table with partitions, use the PARTITION= option in the CREATE TABLE WITH clause.
Partitioned tables have the following restrictions:
• Only one dimension can be specified.
• The distribution type must be HASH.
• Only one location per partition is allowed.
Partitioning Schemes
Each dimension of a partitioning scheme defines a rule (distribution scheme) for assigning rows to partitions. Conceptually, a dimension defines a set of logical partitions. Only one dimension is allowed.
One distribution type is available: HASH. Hash is data-dependent and requires the ON clause.
A hash distribution spreads row evenly among the partitions by use of a hash value (instead of randomly). Given a value for the partitioning columns, a query can predict which partition contains the rows that have the matching value. Thus, a query can restrict its search to a subset of partitions.
The optional logical partition names must be unique for each table. The same partition name can occur in other partitioned tables. If a partition name is omitted, the system generates a name (of the form iipartnn).
If NO LOCATION= is specified for a partition, the location list is defined by the enclosing statement's with_clause (that is, the with_clause that contains the PARTITION= clause).
Partitioning Syntax
A table partition definition has the following format:
PARTITION = (dimension)
The syntax for each partition dimension is:
dimension = rule ON column {, column }
partitionspec {, partitionspec}
| rule partitionspec {, partitionspec}
rule
Defines the type of distribution scheme for assigning rows to partitions. The only valid value is:
HASH
Distributes rows evenly among the partitions according to a hash value.
ON column {, column}
Specifies the columns to partition the table on.
partitionspec
If rule is HASH, defines the number of partitions and optionally their names:
partitionspec = [nn] PARTITION[S] [ ( name {, name} ) ] [with_clause]
where:
nn
Is the number of partitions, which defaults to 1 if omitted.
name
Identifies the partition. When the number of partitions is two or more, a comma-separated list of names can be provided to override the default value.
Default: iipartNN
with_clause
Specifies WITH clause options for partitioning.
The with_clause for partitioning has the following format:
WITH with-option
where
with-option = LOCATION = ( location )
When specifying a LOCATION for the table, this location will only be used for PARTITIONS that are lacking a WITH LOCATION clause.
Guidelines for Partitioned Tables
Follow these guidelines when creating a partitioned table:
• The locations for partitions should be different to exploit the I/O performance of different devices.
• The partition key should be chosen from columns that have uniform values, for example, primary/foreign keys.
• When you expect to have a large number of queries that join tables A and B on the condition A.fk_col = B.col, good partitioning keys for A and B are fk_col and col, respectively.
• When you expect to join tables A and B often, the locations specified for tables A and B should be the same.
• We recommend at most one partition per core, per node:
num_partitions = num_nodes * K
where K is a divisor of number of physical cores per node.
• Creating an index on the columns that are used to define a foreign relationship is not allowed when the tables joined by the foreign relationship do not have the same number of partitions or are not partitioned on the columns (or a matching subset) used for the foreign key relationship. For example:
The following is allowed:
CREATE TABLE X (a i4 NOT NULL,
b i4 NOT NULL,
c i4 NOT NULL)
WITH LOCATION=(db_loc1),
PARTITION=(HASH ON a,c 2 PARTITIONS);
ALTER TABLE X ADD CONSTRAINT pk_x PRIMARY KEY (a,c);
CREATE TABLE Y (c i4 NOT NULL,
d i4 NOT NULL,
e i4)
WITH PARTITION=(HASH ON d,e 2 PARTITIONS WITH LOCATION=(db_loc1));
ALTER TABLE Y ADD CONSTRAINT fk_y FOREIGN KEY(d,e) REFERENCES X(a,c);
CREATE INDEX idx_y ON Y(d,e);
Partitioning keys on c for X and e for Y is also valid.
The following is not allowed:
CREATE TABLE X (a i4 not null,
b i4 not null)
WITH
PARTITION=(HASH on a 2 PARTITIONS,
1 PARTITION WITH LOCATION=(db_loc2),
1 PARTITION);
ALTER TABLE X ADD CONSTRAINT pk_x PRIMARY KEY (a);
CREATE TABLE Y (c i4 NOT NULL,
d i4 NOT NULL)
WITH PARTITION=(HASH ON d 2 PARTITIONS WITH LOCATION=(db_loc1));
ALTER TABLE Y ADD CONSTRAINT fk_y FOREIGN KEY(d) REFERENCES X(a);
CREATE INDEX idx_y ON Y (d);