VH 6.0 | vwload Command--Load Data into a Table

User Guide > User Guide > A. Command Reference > vwload Command--Load Data into a Table

Was this helpful?

vwload Command--Load Data into a Table

The vwload command loads data into a Vector table.

By default, errors do not stop the loading process; however, when an error is encountered, the load is rolled back. Use command options to modify the default behavior.

Note: Using vwload to load data into external tables is not supported.

This command has the following format:

vwload --table tablename [vnode::]dbname [options] datafile ...

-t, --table tablename

Specifies the name of the table that data is to be loaded into.

[vnode::]dbname

Specifies the ID of the remote node (if connecting to a remote server) and name of the database.

datafile

Specifies the names of the files, using full or relative paths, that contain the data being loaded. File names are specified as a space separated list from one to many. An entry can be a file name, a file name with an asterisk as a wildcard, or a directory. If you specify a directory, all files in the directory are loaded. If an asterisk is used in a file name, enclose the entire file name in single quotes. Valid use of wildcards in file names is as follows: *, abc*, *abc, abc*def, *abc*.

File systems supported: hdfs, s3a, abfs, http, https.

Note: File names with an asterisk as a wildcard are not supported when using the http and https file system paths.

options

Specify options, as follows:

-a, -‑attributes attr1,attr2,...

Specifies list of attributes (columns) to load. An empty attr name indicates an input field that is to be ignored.

Default: All attributes are loaded if no ‑‑attributes argument is specified.

aws_access_key 'my_access_key'

Specifies the access key ID to access your S3 bucket on Amazon Web Services. For information on keeping your credentials safe, see the Security Guide.

aws_secret_key 'my_secret_key'

Specifies the secret access key used to access to your S3 bucket on Amazon Web Services. For information on keeping your credentials safe, see the Security Guide.

azure_client_endpoint='https://login.microsoftonline.com/tenant_id/oauth2/token'

Specifies the Directory (tenant) ID used to access your Azure Storage account.

azure_client_id 'client_id'

Specifies the Application (client) ID used to access your Azure Storage account. For information on keeping your credentials safe, see the Security Guide.

azure_client_secret 'client_secret'

Specifies the Client Secret (password) used to access your Azure Storage account. For information on keeping your credentials safe, see the Security Guide.

-B, --rollback on|off

Turns roll back on error on or off. If set to off, the load transaction will not be rolled back when errors were encountered, causing partial data to be loaded.

Default: on

-c, ‑‑cluster

Speeds loading by using parallel mode.

For requirements, restrictions, and changes in behavior when using this option, see vwload in Parallel Mode.

-C, --charset charset

Specifies the input character set (see vwload Supported Character Sets). CHARSET='ASCII' is also supported.

Default: no conversion

-d, --dateformat format | attr=format, ...,

Sets date or timestamp format for the attribute (column). When no attribute is indicated, the value applies to all attributes that are not otherwise set.

Valid values for format are described in vwload Date Format Settings.

Alternatively, a custom format string can be specified by starting the format with a '+' sign followed by the format string. The format string can consist of any of the format specifier characters used on the DATE_FORMAT function, described in the Vector SQL Language Guide.

To be able to use any character in the format string (including for example, "," and "="), you must quote the format string in the same way that you would quote a delimited identifier on the command line (for example: 'ColX="+%M,%Y,%d",ColY=US'). For more information on delimited identifiers, see Regular and Delimited Identifiers in the Vector SQL Language Guide.

Default: US

Examples:

vwload -d col1=+'%d-%b-%y' -t table1 mydb date.in

vwload -d 'Col1=US,Col2=+%M %Y %d' -t table1 mydb2 loadfile.txt

-e, ‑‑escape escapechar

Specifies the escape character to use. This allows escaping of single characters to allow a delimiter character to be part of a field (for example: \) or to allow a quote character to be part of a quoted string.

The argument must be a single ASCII character or an empty string. If the argument is an empty string, this functionality is disabled. To specify a control character, use an escape sequence (see vwload Escape Sequences).

Default: none

-E --escapes

Interprets data escape sequences as generated by MySQL and PostgreSQL. This also allows setting \N as the NULL representation.

Note: The \0, \digits, and \xdigits escape sequences are not supported.

-f, --fdelim fielddelim

Specifies field delimiter to use. The delimiter must be a single character. To specify a control character, use an escape sequence (see vwload Escape Sequences).

Default: "|"

-F, --frequency frequency

Sets the frequency of verbose progress reports, in number of records. Optionally append one of k K m M g G as magnitude multiplier. For example: 5M means print a report every 5 million records.

--generate_opt_file

Generates a template of an options file to stdout.

-h, --help

Displays syntax information.

-H, --header, -H

Skips header line in files.

-i, -ignfirst

Ignores the first field.

-I, --ignlast

Ignores the last field.

-l, --log path

Logs rejected rows and corresponding errors to the specified file. The file is created by vwload. When used with the ‑‑cluster option, specify a directory name instead of a file name. For details, see vwload in Parallel Mode.

-m, --timing

Enables timing information. Timing statistics will be printed at the end. In verbose mode, they will be printed for every progress update.

-M, --money currency-symbol

Sets currency symbol.

Default: “$”

-n, --nullvalue nullvalue

Defines the string that identifies NULL values.

Default: ""

-N, --strictnulls

Uses strict NULL value checking. This distinguishes between plain and quoted or escaped occurrences of a NULL representation. For example, NULL is a NULL value, whereas "NULL" is a 4-character string value. By default, vwload does not make this distinction. This option allows proper loading of some data generated by MySQL and PostgreSQL.

–nonchar_leading_spaces_null

When –nullvalue is used, data may contain spaces in front of the null identifiers. This option applies to non-character columns only. For example, if --nullvalue NADA is specified, –nonchar_leading_spaces_null will interpret a DATE column containing “NADA” or “ NADA” as NULL.

--notnull_empty

Keeps empty strings. Does not consider the input value NULL if it is empty (that is, contains two consecutive field delimiters). For VARCHAR, the value becomes an empty string; for CHAR the value becomes blanks. This option applies only to columns that are NOT NULL of type CHAR, NCHAR, VARCHAR, or NVARCHAR.

-p, --profile file

Writes server side profiling data to file.

-P, --password password

Specifies the user password.

Note: Specifying the password on the command line is insecure because the command line can be viewed by other users with programs such as ps. Vwload overwrites the password on the command line with zeros during program initialization to reduce the security risk, but a short time still exists where the password is visible to other users. It is more secure to let the program prompt for the password.

Default: Prompts on TTY when needed.

-q, --quote quotechar

Specifies the quote character(s) to use. This allows the input to contain quoted fields (for example "Doe, John"), which may contain field or record delimiter characters. To include a quote character inside a quoted field, enter it twice in the input--for example: "The ""BIG"" Boss". When using distinct open and close quote characters, enter only the close quote character twice--for example: [The [BIG]] Boss].

The argument must be one or two ASCII characters or an empty string.

If the argument is an empty string, this functionality is disabled. If the argument is two characters, the first character is used as the opening quote character, and the second as the closing quote character. For example, specify “[]” to allow [quoted string].

To specify a control character, use an escape sequence (see vwload Escape Sequences).

Default: none

-r, --rdelim recorddelim

Specifies record delimiter to use. The delimiter must be a single character. To specify a control character, use an escape sequence (see vwload Escape Sequences).

Default: "\n"

-R, --rowmode

Inserts data through the PDT (in-memory) updates, rather than directly appending the data in bulk to the table's data blocks on disk. This allows concurrent inserts, at the cost of decreased performance at high bulk volume, increased memory consumption. At the time of the load, the whole volume of loaded data will have to fit in memory, and only later it will be propagated to the table's blocks on disk through update propagation.

-s, --skip n

Specifies the number of records to skip.

Default: 0

-S, --substitute substitutechar

Substitutes the specified character for any invalid input character during character set conversion. When no substitute character is provided, an invalid input character is considered an error condition. Specifying a substitute character allows records that contain invalid characters to be loaded successfully.

The argument must be a single Unicode character or an empty string. If the argument is an empty string, the functionality is disabled.

To specify a control character or Unicode code point that cannot easily be typed on the keyboard, use an escape sequence.

-T, --textmode

Opens input files in text mode and does not perform newline conversion. On Linux, this may gain a 5 to 10 percent performance improvement.

Note: When using -T on Windows, any ^Z (ASCII SUB) character in the input is interpreted as End-Of-File by the Windows library, and causes vwload not to see any data following the first such character.

-u, --user username

Specifies the effective user for the session.

+user=authuser[,password]

Specifies the user name and password used for DBMS_authentication (see User Password on page 45).

-v, --verbose

Prints progress report.

-x, --errcount n

Terminates after n errors.

When used with the ‑‑cluster option, terminates after the first n errors in all input files. For details, see vwload in Parallel Mode.

Default: 0 (do not terminate)

-z, --stats

Creates statistics for the table. Builds histograms for all columns of the loaded table.

-Z, --auto_detect_compression

Detects GZIP and ZIP files. Compressed files are automatically detected by the file contents, not by the file suffix. If you are loading from a directory or using a file name wildcard, you can mix regular text files, GZIP, and ZIP.

Note: Vwload supports compression method 8 (Deflate) only.

vwload Escape Sequences

To specify control characters in the vwload command, you must use an escape sequence. An escape sequence is initiated by a \ character. Valid escape sequences are:

Escape Sequence	Description
\a	Bell (alert)
\b	Back space
\f	Form feed
\n	Newline
\r	Carriage return
\t	Tab
\v	Vertical tab
\nnn	The character with octal code value nnn
\uxxxx	The 2-byte Unicode code point with hexadecimal value xxxx
\\	\ character

Note: Certain special characters, such as \, ", ‘, and |, must be protected from interpretation by the command shell by using the appropriate quoting and escaping mechanisms provided by the shell. This does not apply to Actian Director, which automatically takes care of such formatting.

vwload Date Format Settings

The ‑‑dateformat format | attr=format option on the vwload command sets the date format for the attribute (column).

Valid settings for format are as follows:

Setting	Valid Input Formats	Output Format
US (default)	mm/dd/yy mm-dd-yy mmddyy mm/dd/yyyy mm-dd-yyyy mmddyyyy dd-mmm-yyyy dd mmm yyyy yyyy-mm-dd yyyy.mm.dd yyyy_mm_dd mm-dd mm/dd am and pm format in timestamp. Only hours between 1 and 12 are valid. am and pm must be in lowercase.	dd-mmm-yyyy
MULTINATIONAL	yyyy-mm-dd mm-dd-yy mmddyy mmddyyyy dd/mm/yy dd/mm/yyyy dd mmm yyyy All US formats except mm/dd/yyyy and mm/dd/yy	dd/mm/yy
MULTINATIONAL4	yyyy-mm-dd dd/mm/yy dd/mm/yyyy dd mmm yyyy mm-dd-yy mmddyy mmddyyyy All US formats except mm/dd/yyyy and mm/dd/yy	dd/mm/yyyy
ISO	yyyy-mm-dd yyyymmdd yymmdd ymmdd mmdd mdd dd mmm yyyy All US input formats except mmddyy	yymmdd
ISO8601	All ISO input formats	yyyy-dd-mmThh:mm:ssZ where hh is in 24-hour format and Z indicates Zulu (UTC) timezone
ISO4	yyyy-mm-dd yyyymmdd yymmdd ymmdd mmdd mdd dd mmm yyyy All US input formats except mmddyy	yyyymmdd
ISO4T	All ISO4 input formats. If the prefix “T” is used then the absolute time component can use the input format hhmmss as well as the standard hh:mm:ss. For example: Acceptable: yyyymmddThhmmss, yyyymmddThh:mm:ss, and yyyymmdd hh:mm:ss Not acceptable: yyyymmdd hhmmss Acceptable: 'Thhmmss', 'Thh:mm:ss' and 'hh:mm:ss' Not acceptable: 'hhmmss' This is the only case where an absolute time can be entered in format hhmmss instead of hh:mm:ss. To avoid ambiguity, a time field entered in hhmmss format must be 6 characters long.	yyyymmdd ISO4 output format, unless the date includes a time, in which case the format is: yyyymmddThhmmss
ISO4TC	See description under ISO4T.	ISO4 output format, unless the date includes a time, in which case the format is: yyyymmddThh:mm:ss
SWEDEN or FINLAND	yyyy-mm-dd yy-mm-dd mmddyy dd mmm yyyy All US input formats except mm-dd-yyyy	yyyy-mm-dd
GERMAN	yyyy-mm-dd dd.mm.yyyy ddmmyy dmmyy dmmyyyy ddmmyyyy dd mmm yyyy mm-dd-yy All US input formats except yyyy.mm.dd and mmddyy	dd.mm.yyyy
YMD	mm/dd mm-dd mmdd yymdd yymmdd yy-mm-dd yyyymdd yyyy-mmm-dd yyyy/mm/dd yyyy.mm.dd yyyy-mm-dd yyyy_mm_dd yyyymmdd yyyy mmm dd	yyyy-mmm-dd
DMY	yyyy-mm-dd yyyy_mm_dd dd/mm dd-mm ddmm ddmyy dd-mm-yy ddmmyy ddmyyyy ddmmyyyy dd/mm/yyyy dd-mm-yyyy dd.mm.yyyy dd-mmm-yyyy dd mmm yyyy	dd-mmm-yyyy
MDY	yyyy-mm-dd yyyy_mm_dd mm/dd mm-dd mmdd mmddyy mddyy mddyyyy mm-dd-yy mm-dd-yyyy mm/dd/yyyy mm.dd.yyyy mmddyyyy mmm-dd-yyyy	mmm-dd-yyyy

For a date that is missing the century on input, year is determined by the setting on the II_DATE_CENTURY_BOUNDARY environment variable.

In three-character month formats, for example, dd-mmm-yy, specify three-letter abbreviations for the month (for example, mar, apr, may).

To specify the current system date and time, use the constant, NOW.

vwload Supported Character Sets

Character sets supported on the vwload --charset option are as follows:

Character Set	Description	Format
ALT	Support of Cyrillic on DOS	Single byte
ARABIC	Arabic-449-Plus	Single byte
CHINESES	Simplified Chinese - PRC	Double byte
CHTBIG5	Traditional Chinese - Taiwan, BIG5	Double byte
CHTEUC	Traditional Chinese - Taiwan, EUC	Double byte
CHTHP	Traditional Chinese - Taiwan, HP ROC15	Double byte
CSGB2312	Simplified Chinese - GB2312	Double byte
CSGBK	Simplified Chinese - GBK	Double byte
CW	Cyrillic on Windows 3.1	Single byte
DECMULTI	DEC Multinational (superset of ASCII) and default for VMS	Single byte
DOSASMO	IBM DOS ASMO Arabic (cp708)	Single byte
ELOT437	Greek for PC/RS6000/SCO-UNIX	Single byte
GREEK	DEC Greek Elot	Single byte
HEBREW	DEC Hebrew	Single byte
HPROMAN8	HP Roman8 (superset of ASCII)	Single byte
IBMPC437	IBM PC Code Page 437 (US and English)	Single byte
IBMPC850	IBM PC Code Page 850 (Multilingual), includes accented characters	Single byte
IBMPC866	IBM PC 866 (Cyrillic for DOS)	Single byte
IS885915	ISO 8859/2 (Latin and some Greek). Identical to ISO 8859/1 Latin, except for eight characters, including the Euro currency symbol (€, Unicode U+20AC).	Single byte
ISO88591	ISO 8859/1 Latin and default for UNIX (superset of ASCII)	Single byte
ISO88592	8859/5 (Latin and Cyrillic)	Single byte
ISO88595	8859/9 (Latin and some Turkish) CP 920	Single byte
ISO88597	ISO 8859/7 (Greek)	Single byte
ISO88599	ISO 8859/15 (Latin and Euro sign)	Single byte
KANJIEUC	Japanese, EUC	Double byte
KOI18	KOI 8-bit (ISO 6937/8), Russia	Single byte
KOREAN	Korean	Double byte
PC737	IBM PC Code page 737 - Greek	Single byte
PC857	IBM PC Code page 857 - Turkish	Single byte
PCHEBREW	IBM PC / MSDOS Hebrew	Single byte
SHIFTJIS	Shift-JIS Japanese	Double byte
SLAV852	IBM PC Code Page 852 (Slavic)	Single byte
THAI	DEC Thai Tis	Single byte
UTF8	Unicode encoding form UTF-8	Multi-byte
WARABIC	Arabic	Single byte
WHEBREW	Microsoft Windows Hebrew	Single byte
WIN1250	Eastern Europe: Windows page 1250	Single byte
WIN1252	Windows code page 1252 - Latin 1 (Western Europe) and default for Windows	Single byte
WIN1253	Modern Greek	Single byte
WTHAI	IBM/Windows Thai (cp874)	Single byte

vwload in Parallel Mode

The vwload ‑‑cluster option speeds up loading by parallelizing some of the processing steps.

Note: The ‑‑cluster option should always be used in VectorH.

Requirements for Parallel vwload

If parallel mode is enabled, all processing steps including reading input files from disk, UTF8 verification or character set conversion, parsing, and data type conversion are done by the server (that is, by all nodes in the cluster used by VectorH). Therefore, files must be accessible by the server, not the vwload client (as in normal mode).

Vwload cannot split one big input file to load it in parallel. Multiple input files must be specified to benefit from parallelization. Parallelization works best when all input files are approximately the same size.

Because the input files are read by the servers on all nodes in the cluster, the paths should be accessible from all cluster nodes. They can be either paths into a shared filesystem (HDFS, or network paths mounted on all nodes), or data replicated under the same path on all nodes. There is no control over which files will be read by a server on which node.

Note: The directory for vectorwise.log must also be accessible by the servers on all nodes.

To use the vwload command in parallel mode, /proc/sys/vm/overcommit_memory must be set to 1 (for details, see Virtual Address Space Allocation in the System Administrator Guide) or [memory] max_overalloc must be 0.

Limitations of Parallel vwload

In parallel mode, compression and writing data to disk remain as in regular vwload. Reading and parsing input gets parallelized (and distributed over cluster nodes). Also, parallel vwload imposes less communication overhead, because all the processing is performed inside a single server process. Therefore, vwload in parallel mode may be faster even when loading a single file.

In parallel mode vwload does not output all parsing and conversion errors. It outputs only the first error encountered (if any). Therefore, we strongly recommend using the ‑‑log option to be able to see all rows that were rejected during load and corresponding errors. Also, in parallel mode, vwload reports only the number of loaded tuples. It does not report the number of errors (rejected rows) and the total number of processed rows.

The following options cannot be used in parallel mode:

• ‑‑skip

• ‑‑frequency

• ‑‑verbose

The following options behave differently in parallel mode:

‑‑errcount n

In regular mode, first n errors are ignored. In parallel mode the first n errors in each input file are ignored. In particular, with m input files, the maximum number of ignored errors is n*m.

‑‑log path

In regular mode, the path specified is a file. The file is created by vwload. The file will contain rejected rows and corresponding errors.

In parallel mode, the path specified is a directory. The directory is created if it does not exist. If errors are encountered during load, vwload creates two files for each input file. For example, if errors occur while loading file “input1”, vwload creates:

• path/input1_reject with rows that were not loaded (were rejected)

• path/input1_errors with errors that caused those rows not to be loaded

Note: VectorH does not support multiple instances of vwload loading data into a single table.

vwload from Remote System

The vwload command can be used on a client to load data into a remote database over a direct network connection. It is not necessary to first copy the input data files to the machine where the database resides.

Note: Remote vwload requires a 64-bit client.

The network connection used for the data loading is unencrypted so it must be suitably secure.

Your system administrator may need to allow the direct connection between the vwload client and the Vector Server through a firewall. The Vector port number for a specific database must be statically configured.

For example, the following command issued on a remote client connects to the vnode “production” and loads data into the mydb database on the Vector Server:

vwload --table t1 production::mydb loadfile.txt

vwload Examples

Note: Using vwload to load data into external tables is not supported.

1. Load the contents of a CSV file named t1.txt into table t1 of the mydb database. In the t1 table, delimit fields with a comma (,) and quoted strings with quotation marks ("). Skip one record in the t1.txt file and log errors to the t1.log file.

vwload -f "," -q "\"" -s 1 -l t1.log -t t1 mydb t1.txt

2. Load into table t1 of the mydb database only the first and third columns from loadfile.txt, which has three columns. (The empty attribute name tells vwload to ignore the second column in loadfile.txt.) The columns in the table are named a and b.

vwload ‑‑table t1 ‑‑attributes a,,b mydb loadfile.txt

3. Load into table t1 of the mydb database a date column named “dt” using the US date format:

vwload ‑‑table t1 ‑‑dateformat dt=US mydb loadfile.txt

4. Load the contents from file data into table info in database db, convert from character set Windows code page 1252, and substitute an inverted question mark (Unicode code point 00bf) for any invalid characters in the input file:

vwload ‑C windows-1252 ‑S '\u00bf' ‑t info db data

5. Load data into the lineitem table in parallel mode:

vwload --table lineitem --fdelim "|" --cluster dbtest lineitem1.tbl lineitem2.tbl lineitem3.tbl lineitem4.tbl

vwload --table lineitem --fdelim "|" --cluster dbtest 'lineitem*.tbl'

6. Load data from all files in the data directory into the lineitem table in parallel mode:

vwload --table lineitem --fdelim "|" --cluster dbtest data

7. Load the region table into the mydb database from the region.dat file and build histograms on all columns after the table is loaded:

vwload -t -z region mydb region.dat

8. Load the mytable table into the mydb database on HDFS in parallel mode. Columns in the data file are delimited with a vertical bar. Records are delimited by a new line:

vwload --cluster --fdelim "|" --rdelim "\n" --table mytable mydb hdfs://namenode:8020/path/to/data/mytable_1.txt hdfs://namenode:8020/path/to/data/mytable_2.txt hdfs://namenode:8020/path/to/data/mytable_3.txt hdfs://namenode:8020/path/to/data/mytable_4.txt

9. Load the contents of three CSV files that reside in Amazon S3 cloud storage into the ontime table of the ontime database in parallel mode. Ignore the first line (the header) (-H), use a comma as the delimiter (-f), use the double quote character for quoted fields (-q), ignore the last field (-I)

vwload -c -H -f , -q """" -I -t ontime ontime s3a://mys3bucket/path/to/data/On_Time_On_Time_Performance_2014_1.csv s3a://mys3bucket/path/to/data/On_Time_On_Time_Performance_2015_1.csv s3a://mys3bucket/path/to/data/On_Time_On_Time_Performance_2016_1.csv

10. Load data from multiple files residing in Azure Data Lake Storage into the lineitem table of the mydb database:

vwload --table lineitem --fdelim "|" mydb
azure_client_endpoint 'https://login.microsoftonline.com/TENANT_ID/oauth2/token',
azure_client_id 'CLIENT_ID',
azure_client_secret 'CLIENT_SECRET'
'abfs://loadtest@avalanchetest.dfs.core.windows.net/firstfolder/lineitem10.tbl',
'abfs://loadtest@avalanchetest.dfs.core.windows.net/firstfolder/lineitem10_2.tbl',
'abfs://loadtest@avalanchetest.dfs.core.windows.net/firstfolder/lineitem10_3.tbl',
'abfs://loadtest@avalanchetest.dfs.core.windows.net/firstfolder/lineitem10_4.tbl',
'abfs://loadtest@avalanchetest.dfs.core.windows.net/firstfolder/lineitem10_5.tbl',
'abfs://loadtest@avalanchetest.dfs.core.windows.net/firstfolder/lineitem10_6.tbl'

Last modified date: 01/26/2023