VH 6.0 | New Features in Version 5.0

User Guide > User Guide > E. Features Introduced in Previous Versions > New Features in Version 5.0

Was this helpful?

New Features in Version 5.0

The following features were introduced in VectorH 5.0:

Database Administration

• External tables let you read from and write to data sources stored outside of Vector. The data source must be one that Apache Spark is able to read from and write to, such as HDFS files stored in formats like Parquet, ORC, CSV, or tables in external database systems. After the external table is defined with the CREATE EXTERNAL TABLE syntax, queries can be run directly against the external data.

• Distributed Write-Ahead Log: The single LOG file has been split into multiple files stored in the wal directory. This feature improves performance, especially for large data sizes, by using parallel writes and removing the need for the master node to send this information over the network. It alleviates pressure on memory, speeds up COMMIT processing, and improves startup times.

• Distributed indexes, which improve scalability because the master node no longer needs to maintain remote partitions' min-max indexes and join indexes in memory. This feature speeds up DML queries and improves startup times.

• Automatic histogram generation so you do not have to generate statistics for proper query execution. This feature gives you more flexibility in managing statistics. Histograms are automatically generated on all columns that appear in WHERE clauses and do not already have a histogram stored in the catalog. The histograms are generated from sample data maintained in memory by the min-max indexes.

• Clonedb utility, which lets you copy a database from one Vector instance to another, for example, from one installation, machine, or cluster to another. Clonedb can be used to clone a production database for testing purposes.

• A requirement to specify either WITH PARTITION=(...) or WITH NOPARTITION when creating a Vector table using CREATE TABLE or CREATE TABLE AS SELECT syntax. During installation, the configuration parameter partition_spec_required in config.dat is set to vector, which forces you to be aware that partitioning is an essential strategy in VectorH.

• UUID data type and functions: Automatic generation of UUID identifiers for inserting data. A UUID can be used as a primary key and/or as a partition key to ensure that data is spread evenly across nodes.

• SET SERVER_TRACE and SET SESSION_TRACE statements allow tracing of all queries processed by the DBMS Server regardless of the source, whether it be an interactive query, or from a JDBC, ODBC, or .NET connection.

Data Import and Export

• The Spark-Vector Connector has been enhanced to provide parallel unload, useful for large data volumes.

• SQL syntax for parallel vwload (COPY table() VWLOAD FROM 'file1', 'file2',...) performs the same operation as running vwload -c from the command line. Using SQL means the vwload operation can be part of a bigger transaction. A single transaction avoids the overhead of committing separate transactions and writing to disk. This is especially useful when loading data to apply updates.

• SQL syntax for CSV export (INSERT INTO EXTERNAL CSV 'filename'...) writes a table to a local file system. The result is either a single CSV file or a collection of CSV files, depending on whether the query is run in parallel.

Hadoop

• Detecting YARN resources at install time and dynamically adapting VectorH configuration.

Security

• Documentation on using Hadoop security systems Apache Knox and Apache Ranger with VectorH.

New Features in Version 4.2

The following features were introduced in VectorH 4.2:

• Security:

– Data at rest encryption allows specific table columns to be encrypted

– Query level auditing (C2 security)

• Performance optimization:

– Query performance optimizations, including union flattening at the server, session, and statement levels

– I/O optimizations

• YARN integration:

– Ability to preempt VectorH jobs when higher priority jobs require it

– High availability during HDFS DataNode failure

• Installer improvements, including RPM support

• Support for 2048 columns

• Direct table unloading to CSV files through the CSVEXPORT system command.

• Additions to aggregate window functions:

– ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

– ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING

– ROWS BETWEEN CURRENT ROW AND CURRENT ROW

– Named windows

• SET AUTOCOMMIT READ

New Features in Version 4.1

The following features were introduced in VectorH 4.1:

• Support for INSERT, DELETE, UPDATE, MERGE, ADD COLUMN, and DROP COLUMN

• Full and incremental backup and restore through the ckpdb and rollforwarddb operations

• Master node failover

• YARN support

• Statements MODIFY…TO COMBINE and MODIFY…TO RECONSTRUCT replace the COMBINE and REWRITE forms of the CALL X100 statement, which are deprecated.

• CALL X100 is now a privileged operation that requires the user to have DB_ADMIN database privileges. This feature has upgrade considerations.

• SET INSERTMODE statement lets you control whether inserts go through the PDT (memory) or directly to disk.

• CREATE STATISTICS and DROP STATISTICS statements. Also, new options on COPY FROM and vwload that create statistics on the table just loaded.

• INTERSECT and EXCEPT set operators for use in the same context as UNION

• CREATE TABLE IF NOT EXISTS statement creates the table if it does not exist and returns without error if the table exists.

• Configuration option [engine] enable_reuse_disk_spilling=true enables spilling to disk for reused query plan parts.

Last modified date: 01/26/2023