VH 6.0 | Load Data with vwload Utility

User Guide > User Guide > Loading Data > Load Data with vwload Utility

Was this helpful?

Load Data with vwload Utility

The vwload command can be used to load data into a Vector table. It is less flexible but easier to use than COPY and often faster. You can also use vwload to load data in parallel into a single table to further speed up data loads.

For example, the following command loads the data in the lineitem.txt file into the lineitem table of the dbt3 database. In the lineitem table, fields are delimited by | and records are delimited by a new line (\n).

To load the data into the lineitem table using vwload

Enter the following command at the operating system prompt:

vwload --fdelim "|" --rdelim "\n" --table lineitem dbt3 lineitem.txt

Distributed Data Loading with vwload

Data loading with vwload can be distributed over the cluster if there are multiple input files. The maximum parallelism that can be achieved is limited to the number of input files and the number of total execution cores in the cluster.

To use this method of data loading, all input files must be accessible to all nodes in the same location, such as HDFS, S3, or shared local storage. Use standard utilities to copy the input files to the file system (for example, hdfs dfs -put) or generate the input files with an application that writes directly to the file system.

To enable distributed data loading, add the ‑c option (or --cluster) to the vwload command line and provide the full HDFS path to all input files.

The following command loads the data in the lineitem.txt file into the lineitem table of the dbt3 database. In the lineitem table, fields are delimited by | and records are delimited by a new line (\n)

To load the data into the lineitem table using vwload -c

Enter a command like the following at the operating system prompt:

vwload -c --fdelim "|" --rdelim "\n" --table lineitem dbt3 hdfs://namenode:8020/path/to/data/lineitem_1.txt hdfs://namenode:8020/path/to/data/lineitem_2.txt . . .

Load Data from Cloud Sources with vwload

You can load data from cloud sources such as Amazon S3 (s3a://) or Azure Data Lake Storage (abfs://) using vwload. Specify the full URI for the data files you want to load. The vwload command accepts any URL supported by the HDFS client.

IMPORTANT! Referencing your AWS and Azure OAuth credentials in the target URI from the command line or in environment variables can leave them easily accessible. To keep your credentials safe, see “Securely Managing Cloud Credentials” in the Security Guide.

The following example loads the customer table into the mydb database in parallel mode. The source files reside in Amazon S3 cloud storage. Columns in the data file are delimited with a vertical bar. Records are delimited by a new line:

vwload -c --fdelim "|" --rdelim "\n" --table customer mydb s3a://mys3bucket/path/to/data/customer1.csv s3a://mys3bucket/path/to/data/customer3.csv s3a://mys3bucket/path/to/data/customer3.csv

vwload -c --fdelim "|" --rdelim "\n" --table customer mydb s3a://mys3bucket/path/to/data/customer*.csv

Last modified date: 01/26/2023