Methods for Loading Data
After a table is created, you can load data into it. Available methods are:
• COPY statement
• vwload utility
• COPY VWLOAD
• Spark SQL through the Spark-Vector Connector
Load Data with COPY Statement
The following COPY FROM example loads the lineitem.tbl text file into the lineitem table. It is assumed this file is available in the local directory.
To load the data into the lineitem table using COPY FROM
Type the following SQL statements at the Terminal Monitor prompt:
COPY TABLE lineitem (
l_orderkey = 'c0|',
l_partkey = 'c0|',
l_suppkey = 'c0|',
l_linenumber = 'c0|',
l_quantity = 'c0|',
l_extendedprice = 'c0|',
l_discount = 'c0|',
l_tax = 'c0|',
l_returnflag = 'c0|',
l_linestatus = 'c0|',
l_shipdate = 'c0|',
l_commitdate = 'c0|',
l_receiptdate = 'c0|',
l_shipinstruct = 'c0|',
l_shipmode = 'c0|',
l_comment = 'c0nl'
) FROM 'lineitem.tbl' \g
For details on the COPY statement, see the Vector SQL Language Guide.
Load Data with vwload Utility
The vwload command can be used to load data into a Vector table. It is less flexible but easier to use than COPY and often faster. You can also use vwload to load data in parallel into a single table to further speed up data loads.
For example, the following command loads the data in the lineitem.txt file into the lineitem table of the dbt3 database. In the lineitem table, fields are delimited by | and records are delimited by a new line (\n).
To load the data into the lineitem table using vwload
Enter the following command at the operating system prompt:
vwload --fdelim "|" --rdelim "\n" --table lineitem dbt3 lineitem.txt
Load Data with COPY VWLOAD Statement
The COPY...VWLOAD statement copies the contents of a file to a table using the VWLOAD operator. This operation is similar to using vwload -‑cluster on the command line, except that the load can be done from within a multi-statement transaction instead of in a separate session.
The vwload --cluster option speeds up loading by parallelizing some of the processing steps.
To use this option, multiple input files are required and they must reside on the server, not the client. Parallelization works best when input files have (almost) the same size.
For example, bulk load the data from multiple files into the region table. Columns in the data file are delimited with a vertical bar, records are delimited by a new line, and null values are identified as NULL:
COPY region() VWLOAD FROM 'path/to/data/mytable_1.txt', 'path/to/data/mytable_2.txt' WITH INSERTMODE=BULK, FDELIM='|', RDELIM='\n', NULLVALUE='NULL', WORK_DIR = '/home/users/temp'
Load Data with Spark SQL through the Spark-Vector Connector
The Spark-Vector Connector lets you interact with a Vector database using Apache Spark.
For more information on the Spark-Vector Connector, see
https://github.com/ActianCorp/spark-vector/tree/master.
To load data using Spark SQL
1. Create a Vector table using Spark SQL:
sqlContext.sql("""CREATE TEMPORARY TABLE vector_table
USING com.actian.spark_vector.sql.DefaultSource
OPTIONS (
host "vectorhost",
instance "VI",
database "databasename",
table "vector_table"
)""")
2. Load data into Vector:
sqlContext.sql("INSERT INTO vector_table SELECT * FROM spark_table")