User Guide : 1. New Features in Version 5.1
 
Share this page                  
New Features in Version 5.1
VectorH 5.1 contains the following new features.
Hadoop
Support for HDFS Federation – You can access HDFS through the viewfs:// URI scheme (for example, viewfs://namespace/path/to/data). viewfs:// is a single view of multiple high availability HDFS file systems (namespaces) in a multiple cluster environment. During installation, the individual namespaces are presented as locations for VectorH in addition to the viewfs:// default. For unattended installs, you can specify the file system to be used on the response file parameter, DEFAULT_FS.
Support for Azure Data Lake (ADL) storage (Gen1 and Gen2) in Microsoft Azure HDInsight through adl:// and abfs:// URI schemes.
Data Import and Export
Enhancements to vwload command:
Support for fast data load from cloud sources – You can specify a URI in the filename list on the vwload command to load data from cloud file systems such as Amazon S3 (s3a:// and s3n://). Any URL supported by the HDFS client is accepted.
Note:  The s3n:// protocol is deprecated; s3a:// is the preferred method for accessing S3.
Support for directory names and limited pattern matching in file names when specifying data files to load with vwload. Also, vwload now allows leading spaces before a null value in the data file and has a new option to keep empty strings for character columns that are NOT NULL (--notnull_empty).
Database Administration
Default partition count - A default partition count can be set for the installation on the DBMS configuration parameter default_npartitions. When creating a table you can specify WITH PARTITION = (HASH ON column DEFAULT PARTITIONS), which will automatically use the configured partition count. This feature is designed for cloud or cluster environments, where almost all tables should be partitioned, and the best partition count is guided by the cluster topology rather than a property of the data.
Table repartitioning - The MODIFY...TO RECONSTRUCT WITH PARTITION statement repartitions data to align the partition count with the number of cluster nodes. This operation should be performed after rescaling the VectorH cluster.
Function-based encryption – The SQL functions AES_ENCRYPT_VARCHAR and AES_DECRYPT_VARCHAR allow AES encryption at the application level by using encryption options on DML such as SELECT, INSERT, and UPDATE statements.
Column masking – The MASKED attribute can be assigned to columns so that unprivileged users cannot view the data. The MASKED [AS {BASIC | NULL | 0 | ‘ ‘ }] column attribute can be used on CREATE TABLE and ALTER TABLE. The UNMASK subject privilege, when assigned, lets the user see the data. A user with the UNMASK privilege can use the SQL function MASK_COLUMN(expr AS {BASIC | NULL | 0 | ‘ ‘ | UNMASK}) in views to selectively unmask data, and control how that data can be interacted with and presented, and who can access it.
MEDIAN and PERCENTILE_CONT aggregate functions – MEDIAN returns the median value. PERCENTILE_CONT returns a value that corresponds to the given fraction (n) in the sort order, where n is greater than 0 and less than 1. PERCENTILE_CONT (.5)... is the same as MEDIAN.
Support for Vector tables in database procedures – The following SQL statements are now supported:
CREATE PROCEDURE
DROP PROCEDURE
DECLARE
EXECUTE PROCEDURE
FOR – ENDFOR
IF-THEN-ELSE
MESSAGE
RAISE ERROR
RETURN
RETURN ROW
WHILE – ENDWHILE
Alterable min-max index – You can create or drop a min-max index for a table with the ALTER TABLE...ADD|DROP MINMAX statement and add or drop columns to and from a min-max index for a table with the ALTER TABLE…ALTER MINMAX ADD|DROP COLUMN statement. In addition, you can use the WITH [NO]MINMAX_SAMPLES option on the CREATE TABLE [AS SELECT], DECLARE GLOBAL TEMPORARY TABLE, and ALTER TABLE…ADD MINMAX statements. A sampled min-max index is used by the optimizer when generating automatic histograms and allows the optimizer to produce better query plans faster.
Nullable unique keys – Columns that you specify as unique or that you use as part of a table-level unique constraint can be nullable. There can be multiple rows with NULL. For multi-column keys (table-level unique constraint), uniqueness is enforced only on keys columns with non-null entries. Likewise, referential constraints are enforced only on non-null entries.
Installation
Time zone files from the Internet Assigned Number Authority (IANA). When installing Vector you must select an IANA time zone, which becomes the value assigned to the Vector environment variable II_TIMEZONE_NAME.
Performance
Performance improvements – Vector performance is improved through internal features such as update propagation improvements, parallel build of shared hash tables, and unique string area.
External Tables enhancement – Support for predicate pushdown on external tables. During query execution, data is filtered as close to the source as possible to avoid loading unnecessary data into memory. This is limited by the ability of the external data source to support the filter used.