Partitioned Tables

4. SQL Statements : CREATE TABLE : Partitioned Tables

Share this page

Partitioned Tables

Partitioning is an essential strategy in VectorH.

When the configuration parameter partition_spec_required=vector is set in config.dat, the system requires you to specify either the WITH PARTITION=(...) or WITH NOPARTITION clause when creating a Vector table using CREATE TABLE or CREATE TABLE AS SELECT syntax. This configuration setting is the default for new installations.

Important! In a query involving two or more tables, at least one table must be partitioned.

Partitioning distributes the rows of a table among sub-tables (partitions). A partitioning scheme determines which rows are sent to which partitions.

After the partitioning scheme is defined, partitioning is managed automatically.

To define a table with partitions, use the PARTITION= option in the CREATE TABLE WITH clause.

Partitioned tables have the following restrictions:

• Only one dimension can be specified.

• The distribution type must be HASH.

• Only one location per partition is allowed.

Partitioning Schemes

Each dimension of a partitioning scheme defines a rule (distribution scheme) for assigning rows to partitions. Conceptually, a dimension defines a set of logical partitions. Only one dimension is allowed.

One distribution type is available: HASH. Hash is data-dependent and requires the ON clause.

A hash distribution spreads row evenly among the partitions by use of a hash value (instead of randomly). Given a value for the partitioning columns, a query can predict which partition contains the rows that have the matching value. Thus, a query can restrict its search to a subset of partitions.

The optional logical partition names must be unique for each table. The same partition name can occur in other partitioned tables. If a partition name is omitted, the system generates a name (of the form iipartnn).

If NO LOCATION= is specified for a partition, the location list is defined by the enclosing statement's with_clause (that is, the with_clause that contains the PARTITION= clause).

Partitioning Syntax

A table partition definition has the following format:

PARTITION = (dimension)

The syntax for each partition dimension is:

dimension = rule ON column {, column }

partitionspec {, partitionspec}

| rule partitionspec {, partitionspec}

rule

Defines the type of distribution scheme for assigning rows to partitions. The only valid value is:

HASH

Distributes rows evenly among the partitions according to a hash value.

ON column {, column}

Specifies the columns to partition the table on.

partitionspec

If rule is HASH, defines the number of partitions and optionally their names:

partitionspec = [nn] PARTITION[S] [ ( name {, name} ) ] [with_clause]

where:

Is the number of partitions, which defaults to 1 if omitted.

name

Identifies the partition. When the number of partitions is two or more, a comma-separated list of names can be provided to override the default value.

Default: iipartNN

with_clause

Specifies WITH clause options for partitioning.

The with_clause for partitioning has the following format:

WITH with-option

where

with-option = LOCATION = ( location )

When specifying a LOCATION for the table, this location will only be used for PARTITIONS that are lacking a WITH LOCATION clause.

Guidelines for Partitioned Tables

Follow these guidelines when creating a partitioned table:

• The locations for partitions should be different to exploit the I/O performance of different devices.

• The partition key should be chosen from columns that have uniform values, for example, primary/foreign keys.

• When you expect to have a large number of queries that join tables A and B on the condition A.fk_col = B.col, good partitioning keys for A and B are fk_col and col, respectively.

• When you expect to join tables A and B often, the locations specified for tables A and B should be the same.

• We recommend at most one partition per core, per node:

num_partitions = num_nodes * K

where K is a divisor of number of physical cores per node.

• Creating an index on the columns that are used to define a foreign relationship is not allowed when the tables joined by the foreign relationship do not have the same number of partitions or are not partitioned on the columns (or a matching subset) used for the foreign key relationship. For example:

The following is allowed:

CREATE TABLE X (a i4 NOT NULL,

b i4 NOT NULL,

c i4 NOT NULL)

WITH LOCATION=(db_loc1),

PARTITION=(HASH ON a,c 2 PARTITIONS);

ALTER TABLE X ADD CONSTRAINT pk_x PRIMARY KEY (a,c);

CREATE TABLE Y (c i4 NOT NULL,

d i4 NOT NULL,

e i4)

WITH PARTITION=(HASH ON d,e 2 PARTITIONS WITH LOCATION=(db_loc1));

ALTER TABLE Y ADD CONSTRAINT fk_y FOREIGN KEY(d,e) REFERENCES X(a,c);

CREATE INDEX idx_y ON Y(d,e);

Partitioning keys on c for X and e for Y is also valid.

The following is not allowed:

CREATE TABLE X (a i4 not null,

b i4 not null)

WITH

PARTITION=(HASH on a 2 PARTITIONS,

1 PARTITION WITH LOCATION=(db_loc2),

1 PARTITION);

ALTER TABLE X ADD CONSTRAINT pk_x PRIMARY KEY (a);

CREATE TABLE Y (c i4 NOT NULL,

d i4 NOT NULL)

WITH PARTITION=(HASH ON d 2 PARTITIONS WITH LOCATION=(db_loc1));

ALTER TABLE Y ADD CONSTRAINT fk_y FOREIGN KEY(d) REFERENCES X(a);

CREATE INDEX idx_y ON Y (d);