Partitioning Tables
In addition to delivering high performance, VectorH is also a relational database with comprehensive SQL support and can be used much in the same way as other relational databases. Available DDL and DML statements are described in the SQL Language Guide and will be familiar to anyone who has used a relational database.
One important consideration, though, is when and how to partition tables.
In VectorH, tables are either partitioned or not. Unpartitioned tables are read into cache by each node and are sometimes referred to as “replicated tables”. Unpartitioned tables are generally the smaller tables by number of rows in the schema (for example dimension tables). Partitioned tables (sometimes referred to as “sharded” tables) are managed as multiple “chunks” where a given chunk (partition) resides on a given node. Medium to large tables (for example, large dimension and fact tables) are more efficiently processed as partitioned tables because each node needs only to deal with the partitions of the table that are stored on that node.
VectorH is designed to operate efficiently, making best use of cluster resources, when at least one table in any nontrivial (joining) query is partitioned. Lack of appropriate partitioning can result in substandard performance; partitioning, therefore, must be considered and implemented at the beginning of the evaluation.
To understand the “location” of partitions, we must consider HDFS and HDFS block replicas; the “location” of a partition refers to the node on which the primary copy of the blocks are stored. HDFS will create block replicas and place them elsewhere in the cluster for resiliency. Resiliency and other benefits afforded to VectorH from HDFS are beyond the scope of this document, but more detail can be found in Reference 1 (see
References and Further Information).
VectorH uses a hash based partitioning scheme. The partition key is hashed to determine which partition a row belongs to. Partitions are mapped to physical nodes at startup time to determine the node that manages a given partition.
Choice of partition key should be such that both data skew (the unevenness of data volume across partitions) and computation skew (the unevenness of processing across partitions) are avoided. Highly unique partition keys are a good choice and can be made up from multiple columns if required. Beware of using just dates as partition keys (these are likely to be good with respect to data skew, but if queries specify a given date then there will likely be computation skew). As the key must be hashed to determine the partition for the row, avoid selecting excessively wide keys as these may impact performance. Integer partition keys will provide the best performance. Character string partition keys require more processing and introduce more overhead than numeric partition keys.
You can check for data skew by using the vwinfo utility with the ‑T flag (compare the block count for each partition), or by counting the rows assigned to each partition using the hidden tuple id column tid. These are discussed further in Reference 6 (see
References and Further Information). A tool to perform this analysis is available on the Actian GitHub site.
Where possible, tables that are joined together should have the same partition key. Doing so will mean the rows of the table that are to be joined together will reside on the same physical nodes (co-located) and allow the join to be performed on the node itself without requiring the data to be moved around the network.
The number of partitions used when creating tables should match max_query_parallelism as set in
Configuring Database Resources. Further details can be found in the Partitioning Guidelines section of the
User Guide. A tool to calculate the default number of partitions for a large table can also be found on the Actian GitHub site.
Disk Space Usage for Tables
Table blocks are allocated as follows. Assuming a replication factor of 3:
• For a non-partitioned table, all blocks reside on the master node and 2 replica blocks are replicated across 2 nodes in the cluster. The choice of nodes for the replicas is decided by HDFS. This means that if you fail to partition large tables and keep loading data, you will most likely fill up the master node first rather than the other cluster nodes (assuming equal disk capacity for all nodes).
• For a partitioned table with almost no skew, disk block usage will be evenly balanced across all VectorH nodes.
• For a partitioned table with heavy skew towards a particular partition, there is a risk of filling up the HDFS disk of the skewed node.