User Guide : 17. Migrating from Vector to VectorH
 
Share this page                  
Migrating from Vector to VectorH
Considerations for Migrating from Vector to VectorH
When migrating your database from Vector to VectorH, you need to consider the following:
Whether the tables will be partitioned and how. For details, see VectorH Partitioning Guidelines and Partitioned Tables in the SQL Language Guide.
Whether to load the data using use SQL COPY statements (automatically generated by the copydb command) or the vwload utility. COPY is typically easier, but vwload is faster so we recommend vwload for larger datasets.
How to Achieve Optimal Loading with vwload
For optimal loading speed with vwload, follow these steps:
1. Split the data files (manually or with a third-party tool) into separate files, preferably equal to the number of partitions but at least a multiple of the number of nodes.
2. Copy the files to HDFS using standard HDFS utilities.
3. Use the vwload operation in parallel (cluster) mode.
How to Migrate a Database from Vector to VectorH
The overall process for migrating a database from Vector to VectorH is as follows:
1. Run the copydb command.
Two command files are created (copy.out and copy.in) that contain SQL statements.
2. Execute the copy.out script.
All tables and views are copied into files in the specified directory.
3. Modify the copy.in script to add partitioning and optionally to remove the COPY statements if you will use vwload rather than COPY to load the data.
4. Create the new database.
5. Execute the copy.in script.
The tables are created. If using COPY, the data is loaded too.
6. Use vwload to load the data if COPY was not used. Input files must be copied to the machine where VectorH resides before issuing the vwload command on the master node.
7. Optimize the database.
8. Run the sysmod command.
Migrate a Database from Vector to VectorH
The following steps show examples of commands to use to migrate your data.
To migrate data from Vector to VectorH
1. Copy the data from Vector using the copydb command:
copydb olddb -c -with_csv -no_loc
The flags “-c” and “-with_csv” create vwload-compatible files.
The –no_loc flag will not write the WITH LOCATION clause for CREATE TABLE statements. This prevents the tables from being created in II_DATABASE because they need to be created in the II_HDFSDATA location.
Two command files are created: copy.in and copy.out.
2. Execute the copy.out script:
sql olddb <copy.out
All tables and views owned by the user are copied into files in the specified directory.
3. Edit the copy.in script as follows:
a. Add the following clause to CREATE TABLE statements:
WITH PARTITION=(HASH ON key x PARTITIONS)
where key is the partition key and x is the number of partitions. For details on the WITH PARTITION clause, see the SQL Language Guide.
b. If you will use the vwload utility instead of COPY to load the data, remove the COPY statements. This allows the script to only create the tables and not load the data.
4. Create the new database on the VectorH master node:
createdb newdb
5. Run the copy.in script on the VectorH master node:
sql newdb <copy.in
The tables are created.
Data is loaded unless you removed the COPY statements in copy.in, in which case proceed to the next step to load the data using vwload.
6. Run vwload on the VectorH master node, assuming data files have been copied to the target machine.
Note:  You can use Actian Director to load data remotely from a client machine. No Vector installation is needed on the client.
If the data file is on the local file system of the master node, use regular vwload (do not use the -c flag).
If the data is in multiple files in HDFS, use parallel (cluster) vwload (use the -c flag). Use the --stats flag, which builds histograms for all columns of the loaded table. For example:
vwload --table tablename –-stats --fdelim "," --cluster dbname
hdfs://namenode:8020/path/to/data/table1.txt
hdfs://namenode:8020/path/to/data/table2.txt
hdfs://namenode:8020/path/to/data/table3.txt
7. Optimize the database:
optimizedb -zns newdb
Note:  This step is not necessary if you used the --stats flag on vwload in the previous step.
8. Modify the system catalogs for the database:
sysmod newdb