User Guide
1. New Features in VectorH 5.0
2. Vector Concepts
Table Structures
Data Storage Format
Raw Table Storage Format
Data Compression
Data Type Storage Format and Compression Type
Query Optimization
Data Manipulation
Granularity of DML Operations
Persistence and Recovery
Management of In-memory Updates
Transaction Isolation Model
Snapshot Isolation
Optimistic Concurrency Control
Transaction Examples
Non-dirty Read Example
Conflict Resolution Example
Transactions and DDL Operations
SQL Access
VECTORWISE Storage Type
Storage Structures
Default Database Characteristics
Transaction Management
Vector Architecture
VectorH Architecture
Command Summary
3. Understanding Usage Scenarios
Typical Use Scenarios
Reporting on DBT-3 Database Benchmark
How to Run the Benchmark
Generate Data
Start Vector
Create a Database
Create a Vector Table
Bulk Load the Data into the Table
Create Statistics
Run a Reporting Query
4. Vector and Ingres
Compatibility with Ingres Database
Incompatibilities with Ingres Database
SQL Cursors
Some Subqueries Do Not Work
Scalar Subquery Cardinality Check
Cross-system Queries
SQL Data Type Incompatibilities
SQL Function Incompatibilities
Differences in Constraint Checking
5. Usage Considerations
Considerations for Using VectorH
VectorH Configuration Considerations
VectorH Storage Location Considerations
VectorH Partitioning Guidelines
How Partitioning Is Handled
VectorH Data Loading Guidelines
VectorH Startup and Shutdown
VectorH Monitoring Considerations
VectorH Backup and Recovery Considerations
VectorH Management Tool
VectorH Limitations
6. Accessing VectorH
Start or Shut Down the Vector Instance
Start Vector as a Service (Linux)
Network Connections
User Authentication
DBMS Authentication
User Password
Connecting to a Database
Connecting to a Local Server
Connecting to a Remote Server
Connecting to a Non-UTF8 Server that Does Not Support Transliteration
7. Configuring and Managing VectorH
X100 Configuration File (vectorwise.conf)
X100 Per-node Configuration File
X100 Configuration File Format
X100 Configuration Parameters
[memory] Settings
[system] Settings
[server] Settings
[cbm] Settings
[engine] Settings
[dbagent] Settings
When to Change the Default Configuration Values
Memory Settings
Memory Configuration Guidelines
I/O Settings
OS Settings
Virtual Address Space Allocation (Linux)
Increase max_map_count Kernel Parameter (Linux)
Using Large Pages
Requirements for Huge Pages on Linux
Designate Memory for Huge Pages on Linux
Using NUMA Optimization
User Processes Limit in RHEL
Using Multiple Databases
SQL Settings
Performance Tips
Create Indexes
Per-query Parallelism Level
Data Size Consideration for Some Queries
Database Write-Ahead Log
WAL Condensation
wal_backups Directory
Error Reporting--vectorwise.log
Configuring Error Reporting
Error Log Record Format
Error Log Configuration File
Error Log Configuration File Format
Example Error Log Configurations
VWLOG Environment Variable--Configure Error Reporting
Rotating the Vector Error Log
Changing Error Log Configuration Settings Dynamically
Logging for Startup of the Vector Server
Configure x100errlog.log Rotation
Session Management
Determine How Long a Query Has Been Running
Remove a Vector Session
Identifying Unused Table Files
Configure the Checking of Free Disk Space
How to Free Disk Space
Configure MPI Runtime
Configuration Parameters in config.dat
Setting Parameter and Derived Parameter Values
Using VCDA to Restore Configuration Parameter
Protecting or Recalculating Derived Parameters
Components that Use Derived Parameters
Access Configuration-By-Forms Utility
How You Navigate in CBF
8. Setting Environment Variables
Scope of Environment Variables
Symbol Table
Linux Environment Variables
Installation-Wide Environment Variables
View Environment Variables
User-definable Environment Variables
Environment Variables that Can be Reset Locally
Environment Variables and Logicals that Cannot Be Reset
Customization of Query Runtime Environment
Syntax Rules for Startup Files and Environment Variables
Dayfile
DBNAME_SQL_INIT
II_SQL_INIT
ING_SET
ING_SET_DBNAME
ING_SYSTEM_SET
Startsql
9. Creating Databases
Database File Locations
Commands for Working with a Database
The Master Database (iidbdb)
The Management Database (imadb)
Rules for Naming Databases
Database Default Characteristics
How a Database Is Created
Create a User
Create a Database
How the Createdb Command Works
Show Information on Databases with infodb
Show Information on Databases with vwinfo
Remove a Database
10. Using Alternate Locations
Locations and Areas
Default Locations
Create an Alternate HDFS Location
Guidelines for Using Locations
Alter or Drop a Location
Work Locations
Initial and Extended Work Locations
Extend and Unextend a Database
Create a Database in an Alternate HDFS Location
11. Managing Tables and Views
Table Management
Tools for Creating a Table
Table Ownership
Table Location
Requirements for Using an Alternate Location for a Table
Create a Table in an Alternate Location
Enable or Disable Journaling
Duplicate Rows in Tables
Data Type Conversion Functions for Default Values
Constraints
Constraint Types
Delete Constraints
Views
Views and Permissions
Working with View Objects
Updates on Views
Types of Updates Not Permitted on Views
Schemas
Synonyms, Temporary Tables, and Comments
Synonyms
Working with Synonym Objects
Temporary Tables
Temporary Table Declaration and the Optional SESSION Schema Qualifier
Examples of Working with Temporary Tables
Comments to Describe Tables and Views
The COMMENT ON Statement
12. Loading Data
Methods for Loading Data
Load Data with COPY Statement
Load Data with vwload Utility
Distributed Data Loading with vwload
Load Data with Spark SQL through the Spark-Vector Connector
13. Updating Data
Methods for Updating Data
Batch and Bulk Updates
Transfer Data from Traditional Ingres Table
Combining Tables
How to Avoid Propagation
MODIFY...TO COMBINE Statement--Merge and Update Data
Propagate In-memory Updates to Disk
Best Practices for Updates
Allowing Concurrent Inserts
14. Using the Query Optimizer
Data and Query Optimization
Database Statistics
Methods for Generating Statistics
Automatic Histogram Generation
Assumptions of the Query Optimizer
Resources Required During Optimization
System Modification After Optimization
Run System Modification
Information Collected by the Optimizer
Types and Levels of Statistics Collected
Non-Sampled and Sampled Statistics
Full Statistics
Minmax Statistics
Key Column Statistics
Statistics from an Input Text File
Column Statistics
Histogram (Optimization Output)
Histogram Cells
Statistics and Global Temporary Tables
When to Rerun Optimization
Example: Before and After Optimization
Query Execution Plans
Information on a QEP
View a QEP
Control QEP Generation Using a Environment Variable
Text-Only QEP
QEPs as Data Flow Trees
Modes for Showing Tree Diagrams
Concise QEP
Graphical QEP
Types of Nodes in a QEP
Sort Nodes in a QEP
Non-Join Nodes in a QEP
Orig Nodes
Projection-Restriction Nodes
Exchange Nodes
Join Nodes in a QEP
Cartesian Product Node
Full Sort Merge Node
Partial Sort Merge Node
Hash Join Node
Key and Tid Lookup Join Node
Subquery Join Node
Multiple Query Execution Plans
More Complex QEPs
Optimizer Timeout
Control Optimizer Timeout
Greedy Optimization
Control Greedy Optimization
"Reuse" Heuristic
Summary for Evaluating QEPs
Specialized Statistics Processing
Display Optimizer Statistics
Display Optimizer Statistics for Individual Tables and Columns
Delete Optimizer Statistics
Floating Point Precision in Optimizer Statistics Display
Statistics in Text Files
Unload Optimizer Statistics to a Text File
Sample Text File Statistics
Load Optimizer Statistics from a Text File
Copy a Table and Associated Statistics
Sampled Optimizer Statistics
Create Sampled Statistics
Composite Histograms
15. Using External Tables
Introduction to External Tables
How External Tables Work
External Table Requirements
Starting and Stopping the Spark-Vector Provider
Spark-Vector Provider Log
Configuring Logging
Spark-Vector Provider Configuration
Syntax for Defining an External Table
Reading and Writing to an External Table
How to Add Extra Packages
External Table Limitations
External Table Usage Notes
Cohabitation of Vector and Spark-Vector under YARN
16. Disaster Recovery
Master Node Recovery
Requirements for Failover of Master Node
How Master Node Failover Works
Data Locality after Failover
Configuring the High Availability Option for Red Hat Cluster Suite
Install the VectorH Service Script
Install the Cluster Service Script
Set Up the Cluster Service
Red Hat Cluster Configuration
HA Configuration Parameters
DataNode Recovery
Configuring VectorH with High Availability in YARN
Timeout Settings
Slave Node Failover Behavior
High Availability for the DbAgent
Limitations of VectorH High Availability in YARN
Database Recovery
17. Backing Up and Restoring the Database
Methods for Backing Up Data
Logging System
Logging Facility
Recovery Process
Archiver Process
Backup by Checkpoints
Full or Incremental Backup
Online and Offline Checkpoints
Limitations to Checkpoint Operation
Checkpoint (Back up) a Database with ckpdb
Checkpoint Template File
Management of Checkpoint, Journal, and Dump Files
Checkpoint File Version Numbers
Keep N Checkpoints
Maintaining Checkpoint Files Offline
Additional Commands for Managing Checkpoints
Checkpoints and Destroyed Databases
Journals
Tools for Performing Journaling
Enable Journaling on an Entire Database
Journaling and Online/Offline Checkpoints
Methods for Stopping Journaling
Disable Journaling When Checkpointing
Disable Journaling When Altering a Database
Database Characteristics Affected by Alterdb
Backup and Restore Scenario Using Checkpoints and Journals
Backup and Restore Using Copydb
Copy Files to a Backup Location Using OS Commands
Recovery from Checkpoints
How the Roll Forward (Restore) Operation Works
Limitations to Rollforward Operation
Tools for Performing a Roll Forward
Recover a Database with rollforwarddb
Recover a Journaled Database
Recover a Non-Journaled Database
Recover a Database from Tape Checkpoints
Recover a Database from an Old Checkpoint
Recover from the Loss of the Transaction Log File
Backup and Recovery of the Master Database (iidbdb)
The iidbdb and Checkpointing
Checkpoint Template File Description
Checkpoint Template Codes
Examples: Checkpoint Template Code
Substitution Parameters in the Checkpoint Template File
Valid Code Combinations in the Checkpoint Template File
Format of the Checkpoint Template File in Linux
Archiver Exit Script (acpexit)
Customization of Archiver Exit Script
Archiver Exit Script Parameters
18. Monitoring
View Information about a Database
vwinfo Statistics Display
vwinfo Transactions Display
vwinfo PDT Usage Display
View Query Profile Information
iivwquery View
iivwprof View
Interpreting I/O Information in IIVWPROF View
iivwprof_last View
Examples of Using Query Profiling Information
Configure Query Profiling
System Catalogs for Vector Query Profiling Views
iivwprof_query Catalog
iivwprof_op Catalog
iivwprof_io Catalog
iivwprof_stage Catalog
iivwprof_stage Catalog
iivwprof_expr Catalog
iivwprof_parse_tree Catalog
19. Managing Cluster Resources
How VectorH Interacts with YARN
YARN Required Settings
Additional YARN Settings when Using Kerberos
Configure YARN Integration
Configure Resource Allocation in the Cluster
How the Set of Slave Nodes Is Determined
Preemption Support
How VectorH Preemption Support Works
Configuring VectorH for Preemption Support
YARN Configuration Settings
How to Add and Remove Slave Nodes
20. Migrating from Vector to VectorH
Considerations for Migrating from Vector to VectorH
How to Achieve Optimal Loading with vwload
How to Migrate a Database from Vector to VectorH
Migrate a Database from Vector to VectorH
A. Command Reference
Where to Issue Commands
General Command Syntax
Dynamic Vnode Specification--Connect to Remote Node
Standard Flags and Parameters
Schema Qualifier--Specify Ownership
Delimited Identifiers on the Command Line
Delimited Identifiers Used on Authorization Parameters
Delimited Identifiers and Case Sensitivity
accessdb Command--Authorize User Access
alterdb Command--Set Database Characteristics
cbf Command--Start Configuration-By-Forms
ckpdb Command--Back Up a Database
ckpdb Examples
clonedb Command--Clone a Database
clonedb Example
copydb Command--Copy and Restore a Database
copydb Examples
createdb Command--Create a Database
createdb Examples
destroydb Command--Destroy a Database
destroydb Examples
extenddb Command--Extend Database to New Location
extenddb Example
iiinitres Command--Install Parameter into CONFIG.DAT
iigetres Command--Get the Value of a Resource
iimonitor Command--Administer DBMS, Recovery, and GCF Servers
iimonitor Utility Commands
Help Command
Show Server Command
Show Sessions Command
Session ID Format in Command Output
Transaction Mode and Information in Command Output
Output of Show All Sessions Command
Format Sessions Command
Set Server Command
Stop Server Command
iimonitor Commands That Perform Actions on Sessions
dmtstatrcpstats Command
Quit Command
iimonitor Commands Specific to GCF Servers
Set Trace Command
Register Server Command
Remove Tickets Command
Remove Pooled Sessions Command
iinamu Command--Administer the Name Server
iinamu Example: Show All Registered Servers
iinamu Example: Show All DBMS Servers for the Server Class Ingres
iinamu Example: Show Communications Server Registrations
iinamu Example: Add a DBMS Server to the Name Server Registry
iinamu Example: Delete a DBMS Server from the Name Server Registry
iinamu Example: Stop the Name Server
iiodbcadmin Command--Run ODBC Administrator
iiodbcadmin Examples
iiremres Command--Remove Parameter from CONFIG.DAT
iisetres Command--Set Configuration Parameter
iisetres Example
iisuhdfs Command--Configure Slave Nodes
iisuhdfs Example
iivalres Command--Validate Configuration Resource
infodb Command--Display Database Information
ingprenv Command--Display Environment Variable Value
ingprenv Example
ingsetenv Command--Set Vector Environment Variable
ingsetenv Example
ingstart Command--Start a Vector Instance
ingstart Examples
ingstatus Command--Show Server Status
ingstop Command--Stop a Vector Instance
ingunset Command--Delete Environment Variable
ipm Command--Start the Interactive Performance Monitor
isql Command--Start Interactive SQL Terminal Monitor
isql Examples
movedb Command--Copy Data Locations to Another Machine
mkrc Command--Generate System Service Script
optimizedb Command--Generate Statistics for the Query Optimizer
optimizedb -z Flags
optimizedb Examples
relocatedb Command--Move Location to Another Location
relocatedb Examples
rollforwarddb Command--Recover a Database
rollforwarddb Example
sql Command--Start the Line-based SQL Terminal Monitor
sql Examples
statdump Command--Print Statistics in iistats and iihistogram Catalogs
statdump Examples
sysmod Command--Modify System Catalogs
unextenddb Command--Unextend a Database Location
unloaddb Command--Create Files for Unloading and Reloading a Database
unloaddb Examples
upgradedb Command--Upgrade a Database
upgradefe Command--Install and Upgrade Tool Catalog Definitions
vhclstatus Command--Display Status of Cluster Services
vwinfo Command--Display Information about a Database
vwload Command--Load Data into a Table
vwload Escape Sequences
vwload Date Format Settings
vwload Supported Character Sets
vwload in Parallel Mode
Requirements for Parallel vwload
Limitations of Parallel vwload
vwload from Remote System
vwload Examples
B. Environment Variables
DBNAME_SQL_INIT
II_AUTH_STRING
II_CHECKPOINT
II_CLIENT
II_CONFIG
II_DATABASE
II_DATE_FORMAT
II_DBMS_LOG
II_DBMS_SERVER
II_DECIMAL
II_DISABLE_SYSCHECK
II_DUMP
II_EMBED_SET
II_ERSEND (Linux)
II_FORCE_HET
II_GC_REMOTE
II_GCN_LOG
II_GCNxx_PORT
II_GCx_TRACE
II_INSTALLATION
II_JOURNAL
II_LANGUAGE
II_LOG
II_MONEY_FORMAT
II_MONEY_PREC
II_MSGDIR
II_MTS_JAVA_HOME
II_NULL_STRING
II_NUM_OF_PROCESSORS
II_PRINTSCREEN _FILE
II_RCP_STATS
II_SQL_INIT
II_SYSTEM
II_TEMPORARY
II_TERMCAP_FILE
II_TIMEZONE_NAME
II_TM_EXIT_ON_ERROR
II_TM_ON_ERROR
II_VNODE_PATH
II_WORK
IIDLDIR
ING_SET
ING_PRINT
ING_SET_DBNAME
ING_SYSTEM_SET
INGRES_KEYS
ING_SHELL (Linux)
TERM_INGRES
VWLOG
C. Limits
Limits
D. Features Introduced in Previous Versions
New Features in VectorH 4.2
New Features in VectorH 4.1
User Guide
New Features in VectorH 4.1