User Guide : 10. Loading Data
 
Share this page                  
Loading Data
Methods for Loading Data
After a table is created, you can load data into it. Available methods are:
COPY statement
vwload utility
COPY VWLOAD
Spark SQL through the Spark-Vector Connector
Using external tables
Load Data with COPY Statement
The following COPY FROM example loads the lineitem.tbl text file into the lineitem table. It is assumed this file is available in the local directory.
To load the data into the lineitem table using COPY FROM
Type the following SQL statements at the Terminal Monitor prompt:
COPY TABLE lineitem (
        l_orderkey = 'c0|',
        l_partkey = 'c0|',
        l_suppkey = 'c0|',
        l_linenumber = 'c0|',
        l_quantity = 'c0|',
        l_extendedprice = 'c0|',
        l_discount = 'c0|',
        l_tax = 'c0|',
        l_returnflag = 'c0|',
        l_linestatus = 'c0|',
        l_shipdate = 'c0|',
        l_commitdate = 'c0|',
        l_receiptdate = 'c0|',
        l_shipinstruct = 'c0|',
        l_shipmode = 'c0|',
        l_comment = 'c0nl'
) FROM 'lineitem.tbl' \g
For details on the COPY statement, see the Vector SQL Language Guide.
Load Data with vwload Utility
The vwload command can be used to load data into a Vector table. It is less flexible but easier to use than COPY and often faster. You can also use vwload to load data in parallel into a single table to further speed up data loads.
For example, the following command loads the data in the lineitem.txt file into the lineitem table of the dbt3 database. In the lineitem table, fields are delimited by | and records are delimited by a new line (\n).
To load the data into the lineitem table using vwload
Enter the following command at the operating system prompt:
vwload --fdelim "|" --rdelim "\n" --table lineitem dbt3 lineitem.txt
Distributed Data Loading with vwload
Data loading with vwload can be distributed over the cluster if there are multiple input files. The maximum parallelism that can be achieved is limited to the number of input files and the number of total execution cores in the cluster.
To use this method of data loading, all input files must be accessible to all nodes in the same location, such as HDFS, S3, or shared local storage. Use standard utilities to copy the input files to the file system (for example, hdfs dfs -put) or generate the input files with an application that writes directly to the file system.
To enable distributed data loading, add the ‑c option (or --cluster) to the vwload command line and provide the full HDFS path to all input files.
The following command loads the data in the lineitem.txt file into the lineitem table of the dbt3 database. In the lineitem table, fields are delimited by | and records are delimited by a new line (\n)
To load the data into the lineitem table using vwload -c
Enter a command like the following at the operating system prompt:
vwload -c --fdelim "|" --rdelim "\n" --table lineitem dbt3 hdfs://namenode:8020/path/to/data/lineitem_1.txt hdfs://namenode:8020/path/to/data/lineitem_2.txt . . .
Load Data from Cloud Sources with vwload
You can load data from cloud sources such as Amazon S3 (s3a:// and s3n://) using vwload. Specify the full URI for the data files you want to load. The vwload command accepts any URL supported by the HDFS client.
IMPORTANT!  Referencing your AWS credentials in the target URI from the command line or in environment variables can leave them easily accessible. To keep your credentials safe, see the chapter “Securely Managing Amazon S3 Credentials” in the Security Guide.
The following example loads the customer table into the mydb database in parallel mode. The source files reside in Amazon S3 cloud storage. Columns in the data file are delimited with a vertical bar. Records are delimited by a new line:
vwload -c --fdelim "|" --rdelim "\n" --table customer mydb s3a://mys3bucket/path/to/data/customer1.csv s3a://mys3bucket/path/to/data/customer3.csv s3a://mys3bucket/path/to/data/customer3.csv
or
vwload -c --fdelim "|" --rdelim "\n" --table customer mydb s3a://mys3bucket/path/to/data/customer*.csv
Load Data with Spark SQL through the Spark-Vector Connector
The Spark-Vector Connector lets you interact with a Vector database using Apache Spark.
For more information on the Spark-Vector Connector, see https://github.com/ActianCorp/spark-vector/tree/master.
To load data using Spark SQL
1. Create a Vector table using Spark SQL:
sqlContext.sql("""CREATE TEMPORARY TABLE vector_table
USING com.actian.spark_vector.sql.DefaultSource
OPTIONS (
host "vectorhost",
instance "VI",
database "databasename",
table "vector_table"
)""")
2. Load data into Vector:
sqlContext.sql("INSERT INTO vector_table SELECT * FROM spark_table")