4. Creating a Database and Loading Data : Create a Table : VectorH Partitioning Guidelines
 
Share this page                  
VectorH Partitioning Guidelines
Large tables should be partitioned to ensure distributed query execution. Queries on non-partitioned tables are executed on the master node only. Queries on partitioned tables or on partitioned tables joining with non-partitioned tables are executed on all nodes.
Partitioning is an essential part of VectorH analytical query execution. Tables are partitioned using a hash-based algorithm.
The partition key and number of partitions must be defined manually using the following guidelines:
We recommend at most one partition per core, evenly balanced between all nodes (including the master node), scaling down this number when the number of columns or nodes is large. The typical guideline is: Number of cores divided by 4. So for 4 nodes with 16 cores each, 16 partitions are recommended. That is, (4*16)/4=16.
Tables should be partitioned on a reasonably unique key, for example, primary key, foreign key, or a combination. A partition key can be a compound key of multiple columns. The key should be reasonably unique to avoid data skew, but does not have to be fully unique. Query execution will benefit greatly if tables are partitioned on join keys (for example, primary key for one table of the join and foreign key for the other).