Column Formats for COPY
The following sections describe how to specify the data file format for table columns. The format specifies how each is written and delimited in the data file.
Note: When copying to or from a table that includes long varchar or long byte columns, specify the columns in the order they appear in the table.
Character (Text) Formats
The character formats are the ones most commonly used to read and write ordinary text (human-readable) data files.
The basic character formats are BYTE, C, CHAR, and TEXT. TEXT(0) is a variable-length form. BYTE(n), Cn, CHAR(n), and TEXT(n) are fixed-length forms. An optional delim may follow to specify a delimiter.
The subtle differences between the various character formats are described in
COPY Format Details.
Counted Character Formats
The BYTE VARYING, LONG BYTE, LONG VARCHAR, LONG NVARCHAR, NVARCHAR(n), and VARCHAR formats are "counted" formats: each data file value is preceded by a character count. The character count defines the length of the data value; the actual field length as defined by a fixed-length specifier or a delimiter may be larger. On input (COPY FROM), extra field characters beyond those included by the embedded character count are ignored. On output (COPY INTO), any extra field length after the actual value is filled with padding, as defined by the specific format.
The fixed-length forms are BYTE VARYING(n), NVARCHAR(n), and VARCHAR(n). The variable-length forms are BYTE VARYING(0), LONG BYTE(0), LONG VARCHAR(0), LONG NVARCHAR(0), and VARCHAR(0). An optional delim may follow to specify a delimiter.
Note: NVARCHAR(0) is not a counted format.
For all fixed-length counted formats: the field length N does not include the preceding length specifier. For example, a VARCHAR(1) field takes 6 bytes. When reading data (COPY FROM), if the character count found in the data is larger than the defined length, a runtime conversion warning is issued and the row is not loaded.
These counted formats are the only ones that can be used with LONG table columns. The two LONG formats can only be used with LONG BYTE or LONG VARCHAR table columns. See
COPY Format Details.
Dummy Format
The D (dummy) format describes a data file column that does not map to any table column. On input (COPY FROM), a D format column describes file data to be skipped and discarded. On output (COPY INTO), a D format column describes constant data to be sent to the data file.
The column name given for any dummy column is not matched to any table column. The Dn form for COPY INTO uses the column name as the value to output; all other uses of the dummy format ignore the column name completely.
Unicode Formats
The Unicode formats are NCHAR, NVARCHAR, and LONG NVARCHAR and they can only be used with NCHAR or NVARCHAR table columns. Fixed-length forms are NCHAR(n) and NVARCHAR(n). Variable-length forms are NCHAR(0), NVARCHAR(0), and LONG NVARCHAR(0).
Fixed-length NCHAR(n) and NVARCHAR(n) formats read and write using the two-byte UCS-2 encoding. The variable-length NCHAR(0), NVARCHAR(0), and LONG NVARCHAR(0) forms read and write using the variable-length UTF8 encoding.
The field length n for NCHAR(n) and NVARCHAR(n) should be specified as character lengths, not byte (octet) lengths. However, the embedded length specifier used by the NCHAR(0) and NVARCHAR(0) formats should give the number of bytes, not characters. (The reason is that NCHAR(0) and NVARCHAR(0) use the UTF8 encoding, which encodes Unicode code points into a variable number of bytes. COPY needs the byte count to know how many bytes to read and decode from UTF8.)
Binary Formats
The formatted COPY statement supports binary formats that match the binary types used to store data in tables. These are the BOOLEAN, DATE, DECIMAL, FLOAT, INTEGER, and MONEY formats (and size variants such as BIGINT, SMALLINT, REAL, and so on). Most data files are text, not binary, so these binary formats are not often needed.
COPY Format Details
This section describes specifying the format of fields in the data file. When specifying data file formats for COPY INTO, be aware of the following points:
• Data from numeric columns, when written to text fields in the data file, is right-justified and filled with blanks on the left.
• When a COPY INTO statement is issued in the Terminal Monitor, the –i and –f command line flags control the format used to convert floating-point table data into text-type file data. To avoid rounding of large floating point values, use the sql command -f flag to specify a floating point format that correctly accommodates the largest value to be copied. For information about the -i and -f flags, see the sql command description in the Command Reference Guide.
• The COPY INTO section often uses the phrase "the display length of the corresponding table column". This means the length of the table column when formatted as a character string. This will be a standard length based on the table column type, and is independent of the actual column value. For example: the display length of an INTEGER column is 13, the display length of a SMALLINT column is 6, and so on.
The following table explains the details for the various COPY list formats. Unless otherwise noted, all non-binary formats can be followed by an optional delim to specify a
delimiter (see page
Delimiters in the Data File).
Format | How Stored (COPY INTO) | How Read (COPY FROM) |
---|
BOOLEAN | Written as a single byte Boolean value (0=FALSE, 1=TRUE). (A binary format) | Same format as COPY INTO |
BYTE(0) | Same as BYTE(n) where n is the display length of the corresponding table column. No transliteration occurs with BYTE data types regardless of CHARSET. | Read as variable-length binary data terminated by the specified delimiter. If a delimiter is not specified, the first comma, tab, or newline encountered ends the value. No transliteration occurs with BYTE data types regardless of CHARSET. |
BYTE(n) where n is 1 to the maximum row size configured, not exceeding 32,000. | Written as a fixed-length byte string. Exactly n bytes are written, padded with zeros if necessary. If given, the delimiter is written after the value and padding. No transliteration occurs with BYTE data types regardless of CHARSET. | Read as a fixed-length byte string; exactly n bytes are read. If a delimiter is specified, one additional character is read and discarded. No transliteration occurs with BYTE data types regardless of CHARSET. |
BYTE VARYING(0) | Same as BYTE VARYING(n) where n is the display length of the table column. No transliteration occurs with BYTE data types regardless of CHARSET. | Read as a variable-length byte string, preceded by a 5‑character, right‑justified length specifier. If a delimiter is specified, additional input is discarded until the delimiter is found. No transliteration occurs with BYTE data types regardless of CHARSET. |
BYTE VARYING(n) where n is 1 to the maximum row size configured, not exceeding 32,000. | Written as a fixed-length byte string preceded by a 5-character, right-justified length specifier. If necessary, the field is padded with zeros to the specified length. If given, the delimiter is written after the value and padding. No transliteration occurs with BYTE data types regardless of CHARSET. | Read as a fixed-length byte string, preceded by a 5-character, right-justified length specifier. If a delimiter is specified, one additional character is read and discarded. No transliteration occurs with BYTE data types regardless of CHARSET. |
C0 | Same as Cn where n is the display length of the corresponding table column | Read as a variable-length string, terminated by the specified delimiter. If a delimiter is not specified, the first comma, tab, or newline encountered ends the value. Any control characters or tabs in the input are converted to spaces. C0 format supports \. The \ is discarded, and the next character is taken literally as part of the value (even if it would normally be the delimiter). To read a \ character, use \\. |
Cn | Written as a fixed-length string, padded with blanks if necessary. Any "non-printing" character (meaning a control character or tab) is converted to a space. If given, the delimiter is written after the value and padding. | Read as a fixed-length string. If a delimiter is specified, one additional character is read and discarded. Any control characters or tabs in the input are converted to spaces. Fixed-length Cn format does not support \. |
CHAR(0) | Same as CHAR(n) where n is the display length of the corresponding table column. | Read as a variable-length string terminated by the specified delimiter. If a delimiter is not specified, the first comma, tab, or newline encountered ends the value. Unlike C format, CHAR does not support \. CHAR also does not convert control characters or tabs. File data is read as is. |
CHAR(n) where n is 1 to the maximum row size configured, not exceeding 32,000 (16,000 in a UTF8 instance). | Written as a fixed-length string, padded with blanks if necessary. If given, the delimiter is written after the value and padding. Unlike C format, CHAR does not do any conversion of control characters or tabs. Table data is output as-is. | Read as a fixed-length string. If a delimiter is specified, one additional character is read and discarded. Unlike C format, CHAR does not convert control characters or tabs. File data is read as is. |
D0 | Instead of placing a value in the file, COPY writes the specified delimiter. (Unlike the Dn format, D0 format does not write the column name.) COPY INTO requires that a delimiter be specified; D0 with no delimiter is not allowed. | Dummy field. Characters are read and discarded until the specified delimiter is encountered. If a delimiter is not specified, the first comma, tab, or newline ends the value. Any \ found in the input means that the next character is to be taken literally, and is not a delimiter. |
Dn | Dummy column. Instead of placing a value in the file, COPY writes the name of the column n times. For example, if you specify x=D1, the column name, x, is written once; if you specify x=D3, COPY writes xxx (the column name, three times), and so on. You can specify a delimiter as a column name, for example, NL=D1. | Dummy field. N characters are read and discarded. COPY FROM does not allow a delimiter specification with a fixed-length dummy field. |
DATE | Written as a date. (A binary format) | Read as a date. (A binary format) |
DECIMAL | Written as a decimal number. (A binary format.) | Read as a decimal number. (A binary format) |
FLOAT | Written as double‑precision floating point. (A binary format) | Read as double‑precision floating point. (A binary format) |
FLOAT4 | Written as single‑precision floating point. (A binary format) | Read as single‑precision floating point. (A binary format) |
INTEGER | Written as integer of 4‑byte length. (A binary format) | Read as integer of 4‑byte length. (A binary format) |
INTEGER1 | Written as integer of 1‑byte length. (A binary format) | Read as integer of 1‑byte length. (A binary format) |
LONG BYTE(0) | Identical to LONG VARCHAR but with no transliteration | Identical to LONG VARCHAR but with no transliteration |
LONG NVARCHAR(0) | Written in segments. Each segment is composed of an integer specifying the length of the segment, followed by a space and the specified number of bytes in UTF-8 encoding. After the last data segment, a final zero-length segment is written (that is, 0 followed by a space). The maximum segment size for the LONG NVARCHAR segment is 32727 bytes. The UTF-8 encoded LONG NVARCHAR data segments are like LONG VARCHAR data segments. See the description for LONG VARCHAR(0) for an example of the encoded data segment. If a delimiter is specified, it is written immediately following the last segment. | Read under the same format as COPY INTO. If a delimiter is specified, one character is read and discarded after the data value is read. |
LONG VARCHAR(0) | Written in segments. Each segment is composed of an integer specifying the length of the segment, followed by a space and the specified number of characters. After the last data segment, a final zero-length segment is written (that is, 0 followed by a space). If a delimiter is specified, it is written immediately following the last segment. The maximum segment length is 32767. The following example shows two data segments, followed by the termination zero length segment. The first segment is 5 characters long, the second segment is 10 characters long, and the termination segment is 0 character long: 5 abcde10 abcdefghij 0 (with a space after the terminating 0 character) (In this example, the data that is in the originating table column is abcdeabcdefghij) | Read under the same format as COPY INTO. If a delimiter is specified, one character is read and discarded after the data value is read. |
MONEY | Written as a scaled floating point value (a money value). (A binary format) | Read as a scaled floating-point values (a money value). (A binary format) |
NCHAR(0) | Written as a Unicode string in UTF-8 encoding, preceded by a 5-character, right-justified byte count. The exact length of the column value is written, without padding. If a delimiter is specified, it is written after the value. | Read as a Unicode string in UTF-8 encoding, preceded by a 5-character, right-justified length specifier. (The length is a byte count, not a character count). If a delimiter is specified, additional input is discarded until the delimiter is encountered. |
NVARCHAR(0) | Same as NCHAR(0) | Same as NCHAR(0) |
NCHAR(n) | Written as a fixed-length Unicode string in UCS-2 encoding. N is the length in characters, not bytes. The value is padded to the specified length with UCS-2 blanks, if necessary. If a delimiter is specified, it is written after the value and padding. | Read using the same format as COPY INTO. If a delimiter is specified, after the value is read, one additional character is read and discarded. |
NVARCHAR(n) | Written as a 2-byte binary integer length specifier, followed by that many Unicode characters using UCS-2 encoding. The value is padded if necessary to the field length n; the padding content is undefined. If a delimiter is specified, it is written after the value and padding. The byte length of the written value excluding delimiter is 2n+2; the length n is in characters, not bytes, and does not include the initial length specifier. | Read using the same format as COPY INTO. If a delimiter is specified, after the value is read, one additional character is read and discarded. |
SMALLINT | Written as an integer of 2‑byte length. (A binary format.) | Read as integer of 2‑byte length. (A binary format) |
TEXT(0) | Written as a variable length string. If a delimiter is specified, it is written after the value. If the originating column is C, CHAR, or NCHAR, trailing blanks are trimmed. If the originating column is TEXT, VARCHAR, or NVARCHAR, the column value is output exactly as-is (no padding, no trimming). If the originating column is a non-character, writes the result of converting the value to a character string, as-is with no padding. COPY INTO using TEXT(0) format is the way to get variable width output with no padding. | Read as variable-length character string terminated by the specified delimiter. If a delimiter is not specified, the first comma, tab, or newline encountered ends the value |
TEXT(n) | Written as a fixed-length string. The value is padded with NULL bytes (zeros) if necessary. If specified, the delimiter is written after the value and padding. | Reads a fixed-length field n characters wide; however if one of those characters is a NULL byte, the value stored into the table is terminated at that NULL byte. If a delimiter is specified, one additional character is read and discarded. |
VARCHAR(0) | Same as VARCHAR(n), where n is the display length of the corresponding table column. | Read as a variable-length string, preceded by a 5‑character, right-justified length specifier. If a delimiter is specified, additional input is discarded until the delimiter is found. |
VARCHAR(n) where n is 1 to the maximum row size configured, not exceeding 32,000 (16,000 in a UTF8 instance). | Written as a fixed-length string preceded by a 5-character, right‑justified length specifier. If necessary, the value is padded with null characters to the specified length. | Read as a fixed-length string, preceded by a 5-character, right-justified length specifier. If a delimiter is specified, one additional character is read and discarded. |
Note: The dummy format (dn) behaves differently for COPY FROM and COPY INTO. When a table is copied into a file, n specifies the number of times the column name is repeated. When copying from a file to a table, n specifies the number of bytes to skip.
For user-defined data types (UDTs), use CHAR or VARCHAR.
Delimiters in the Data File
Delimiters are characters in the data file that separate fields and mark the end of records. Valid delimiters are listed in the following table:
When a single character is specified as the delimiter, enclose that character in quotes. If the data type specification is C or D, the quotes must enclose the entire format. For example, 'd0%' specifies a dummy column delimited by a percent sign (%). If the data type specification uses parentheses around the length, quote only the delimiter. For example, char(0)'%' specifies a char field delimited by a percent sign.
Be careful using the sp (space) or null delimiters, especially with COPY FROM. Spaces or nulls are used as padding characters by many of the COPY formats. If a pad character is improperly treated as a delimiter, the COPY FROM will get out of sync with the input, eventually producing an error. When designing a data file format, use delimiters that will not appear in the data or padding, or use CSV or SSV forms.
CSV and SSV Delimiters
The CSV and SSV delimiters allow COPY to read and write files that contain comma separated values (CSV).
The rules for a CSV delimited field are:
• The field is delimited by a comma, unless it is the last CSV-delimited field in the COPY list and all following fields are dummy fields; in that case, the field is delimited by a newline. (That is, for a standard CSV file, all fields on the COPY statement should be separated by the CSV delimiter and a separate NL delimiter is not needed.)
• COPY FROM: If the first non-blank character in the field is a double quote ("), the field extends until a closing double quote. Commas or newlines inside the quoted string are not delimiters and do not end the value. If a doubled double quote ("") is seen while looking for the closing quote, it is translated to one double quote and the value continues. For example, the data file value:
“There is a double quote “” here”
is translated to the table value:
There is a double quote “ here
Whitespace before the opening double quote, or between the closing double quote and the delimiter (comma or newline), is not part of the value and is discarded.
• COPY INTO: If the value to be written contains a comma, newline, or double quote, it is written enclosed in double quotes using quote doubling as described in the previous bullet item. If the value does not contain a comma, newline, or double quote, it is written as is.
The SSV delimiter works exactly the same as the CSV delimiter, with semicolon in place of comma.
CSV and SSV delimiters are only allowed with BYTE(0), C0, CHAR(0), and TEXT(0). They are not allowed with the “counted” formats (VARCHAR(0) and so on); the count defines the value exactly and there is no need for quoting. (If delimiting is desired, use the comma or nl delimiters on counted formats.)
COPY FROM: Some CSV file variants use quote escaping (\") instead of quote doubling ("") to indicate a quote inside a quoted string. The C format handles\- escaping, so use the C0CSV format and delimiter to handle this type of file. (CSV with COPY INTO always writes quote doubling—never quote escaping—when needed.)
WITH NULL Clause for COPY
The WITH NULL clause allows the data file to contain NULL data. There are two methods of indicating a NULL: an indicator byte (no value given), or a special user-defined marker value. The indicator byte method does not depend on a particular marker value, but it is a binary form not suited to human readable text files. Also, the indicator byte method is not available for variable-length data fields. The user-defined marker value method works with any file format, but depends on a user chosen value that must not otherwise appear in the data.
WITH NULL (value) Clause
When copying data from a table to a file, the WITH NULL (value) clause directs COPY to put the specified value in the file when a null is detected in the corresponding column. If a null is detected and there is no WITH NULL clause, a runtime error occurs, and aborts the COPY statement.
When copying data from a file to a table, the WITH NULL (value) clause specifies a marker value to be interpreted as a null. When COPY encounters this value in the file, it writes a null to the corresponding table column. The table column must be nullable; if it is not, a runtime error occurs, and aborts the COPY statement.
To prevent conflicts between valid data and null entries, choose a value that does not occur as part of the data in your table. The value chosen to represent nulls must be compatible with the format of the field in the file: character formats require quoted values, and binary numeric formats require unquoted numeric values. For example:
This example of a value is incorrect:
c0comma WITH NULL(0)
because the value specified for nulls (numeric zero) conflicts with the character data type of the field. However, this example is correct:
c0comma WITH NULL('0')
because the null value is character data, specified in quotes, and does not conflict with the data type of the field. Do not use the keyword null, quoted or unquoted, for a numeric format.
When copying from a table to a file, be sure that the specified field format is at least as large as the value specified for the WITH NULL clause. If the column format is too small, the DBMS Server truncates the null value written to the data file to fit the specified format.
For example, in the following statement the string, 'NULL,' is truncated to 'N' because the format is incorrectly specified as one character:
COPY TABLE t1str (col1 = CHAR(1) WITH NULL ('NULL')) INTO 't1.dat';
The correct version specifies a 4-character format for the column.
COPY TABLE t1str (col1 = CHAR(4) WITH NULL ('NULL')) INTO 't1.dat';
WITH NULL Clause Omitting Value
If WITH NULL is specified but value is omitted, COPY uses a trailing indicator byte in the file to determine whether a file value is a null. COPY INTO writes a zero trailing byte if the value is not NULL; it writes a nonzero trailing byte to indicate a NULL. (The value written prior to the indicator byte is undefined if the indicator shows NULL.) COPY FROM reads and interprets a trailing indicator byte in the same manner, zero for not null and nonzero for null. Indicator bytes are not allowed for variable-length formats such as char(0); variable-length formats must specify a value clause.