Column Formats for COPY
The following sections describe how you specify the data file format for table columns. The format specifies how each is stored and delimited in the data file.
Character (Text) Formats
The character formats are the ones most commonly used to read and write ordinary text (human-readable) data files.
The basic character formats are BYTE, C, CHAR, and TEXT. Each has a variable-length form and a fixed-length form. The variable-length forms are BYTE(0), C0, CHAR(0), and TEXT(0). The fixed-length forms are BYTE(n), Cn, CHAR(n), and TEXT(n). An optional delim may follow to specify a delimiter.
The subtle differences between the various character formats are described in COPY Format Details.
Counted Character Formats
The BYTE VARYING and VARCHAR formats are "counted" formats: each data file value is preceded by a character count. The character count defines the length of the data value; the actual field length as defined by a fixed-length specifier or a delimiter may be larger. On input (COPY FROM), extra field characters beyond those included by the embedded character count are ignored. On output (COPY INTO), any extra field length after the actual value is filled with padding, as defined by the specific format.
The fixed-length forms are BYTE VARYING(n) and VARCHAR(n). The variable-length forms are BYTE VARYING(0) and VARCHAR(0). An optional delim may follow to specify a delimiter.
For all fixed-length counted formats: the field length N does not include the preceding length specifier. For example, a VARCHAR(1) field takes 6 bytes. When reading data (COPY FROM), if the character count found in the data is larger than the defined length, a runtime conversion warning is issued and the row is not loaded.
Dummy Format
The D (dummy) format describes a data file column that does not map to any table column. On input (COPY FROM), a D format column describes file data to be skipped and discarded. On output (COPY INTO), a D format column describes constant data to be sent to the data file.
The column name given for any dummy column is not matched to any table column. The Dn form for COPY INTO uses the column name as the value to output; all other uses of the dummy format ignore the column name completely.
Binary Formats
The formatted COPY statement supports binary formats that match the binary types used to store data in tables. These are the DATE, DECIMAL, FLOAT, INTEGER, and MONEY formats (and size variants such as BIGINT, SMALLINT, REAL, and so on). Most data files are text, not binary, so these binary formats are not often needed.
COPY Format Details
This section describes specifying the format of fields in the data file. When specifying data file formats for COPY INTO, be aware of the following points:
• Data from numeric columns, when written to text fields in the data file, is right-justified and filled with blanks on the left.
• When a COPY INTO statement is issued in the Terminal Monitor, the –i and –f command line flags control the format used to convert floating-point table data into text-type file data. To avoid rounding of large floating point values, use the sql command -f flag to specify a floating point format that correctly accommodates the largest value to be copied. For information about the -i and -f flags, see the quel command description in the Command Reference Guide.
• The COPY INTO section often uses the phrase "the display length of the corresponding table column". This means the length of the table column when formatted as a character string. This will be a standard length based on the table column type, and is independent of the actual column value. For example: the display length of an INTEGER column is 13, the display length of a SMALLINT column is 6, and so on.
The following table explains the details for the various COPY list formats. Unless otherwise noted, all non-binary formats can be followed by an optional delim to specify a delimiter.
Format | How Stored (COPY INTO) | How Read (COPY FROM) |
---|
BYTE(0) | Same as BYTE(n) where n is the display length of the corresponding table column. | Read as variable-length binary data terminated by the specified delimiter. If a delimiter is not specified, the first comma, tab, or newline encountered ends the value. |
BYTE(n) where n is 1 to the maximum row size configured, not exceeding 32,000. | Written as a fixed-length byte string. Exactly n bytes are written, padded with zeros if necessary. If given, the delimiter is written after the value and padding. | Read as a fixed-length byte string; exactly n bytes are read. If a delimiter is specified, one additional character is read and discarded. |
BYTE VARYING(0) | Same as BYTE VARYING(n) where n is the display length of the table column. | Read as a variable-length byte string, preceded by a 5‑character, right‑justified length specifier. If a delimiter is specified, additional input is discarded until the delimiter is found. |
BYTE VARYING(n) where n is 1 to the maximum row size configured, not exceeding 32,000. | Written as a fixed-length byte string preceded by a 5-character, right-justified length specifier. If necessary, the field is padded with zeros to the specified length. If given, the delimiter is written after the value and padding. | Read as a fixed-length byte string, preceded by a 5-character, right-justified length specifier. If a delimiter is specified, one additional character is read and discarded. |
C0 | Same as Cn where n is the display length of the corresponding table column | Read as a variable-length string, terminated by the specified delimiter. If a delimiter is not specified, the first comma, tab, or newline encountered ends the value. Any control characters or tabs in the input are converted to spaces. C0 format supports \. The \ is discarded, and the next character is taken literally as part of the value (even if it would normally be the delimiter). To read a \ character, use \\. |
Cn | Written as a fixed-length string, padded with blanks if necessary. Any "non-printing" character (meaning a control character or tab) is converted to a space. If given, the delimiter is written after the value and padding. | Read as a fixed-length string. If a delimiter is specified, one additional character is read and discarded. Any control characters or tabs in the input are converted to spaces. Fixed-length Cn format does not support \. |
CHAR(0) | Same as CHAR(n) where n is the display length of the corresponding table column. | Read as a variable-length string terminated by the specified delimiter. If a delimiter is not specified, the first comma, tab, or newline encountered ends the value. Unlike C format, CHAR does not support \. CHAR also does not convert control characters or tabs. File data is read as is. |
CHAR(n) where n is 1 to the maximum row size configured, not exceeding 32,000 (16,000 in a UTF8 instance). | Written as a fixed-length string, padded with blanks if necessary. If given, the delimiter is written after the value and padding. Unlike C format, CHAR does not do any conversion of control characters or tabs. Table data is output as-is. | Read as a fixed-length string. If a delimiter is specified, one additional character is read and discarded. Unlike C format, CHAR does not convert control characters or tabs. File data is read as is. |
D0 | Instead of placing a value in the file, COPY writes the specified delimiter. (Unlike the Dn format, D0 format does not write the column name.) COPY INTO requires that a delimiter be specified; D0 with no delimiter is not allowed. | Dummy field. Characters are read and discarded until the specified delimiter is encountered. If a delimiter is not specified, the first comma, tab, or newline ends the value. Any \ found in the input means that the next character is to be taken literally, and is not a delimiter. |
Dn | Dummy column. Instead of placing a value in the file, COPY writes the name of the column n times. For example, if you specify x=D1, the column name, x, is written once; if you specify x=D3, COPY writes xxx (the column name, three times), and so on. You can specify a delimiter as a column name, for example, NL=D1. | Dummy field. N characters are read and discarded. COPY FROM does not allow a delimiter specification with a fixed-length dummy field. |
DATE | Written as a date. (A binary format) | Read as a date. (A binary format) |
DECIMAL | Written as a decimal number. (A binary format.) | Read as a decimal number. (A binary format) |
FLOAT | Written as double‑precision floating point. (A binary format) | Read as double‑precision floating point. (A binary format) |
FLOAT4 | Written as single‑precision floating point. (A binary format) | Read as single‑precision floating point. (A binary format) |
INTEGER | Written as integer of 4‑byte length. (A binary format) | Read as integer of 4‑byte length. (A binary format) |
INTEGER1 | Written as integer of 1‑byte length. (A binary format) | Read as integer of 1‑byte length. (A binary format) |
MONEY | Written as a scaled floating point value (a money value). (A binary format) | Read as a scaled floating-point values (a money value). (A binary format) |
SMALLINT | Written as an integer of 2‑byte length. (A binary format.) | Read as integer of 2‑byte length. (A binary format) |
TEXT(0) | Written as a variable length string. If a delimiter is specified, it is written after the value. If the originating column is C, CHAR, or NCHAR, trailing blanks are trimmed. If the originating column is TEXT, VARCHAR, or NVARCHAR, the column value is output exactly as-is (no padding, no trimming). If the originating column is a non-character, writes the result of converting the value to a character string, as-is with no padding. COPY INTO using TEXT(0) format is the way to get variable width output with no padding. | Read as variable-length character string terminated by the specified delimiter. If a delimiter is not specified, the first comma, tab, or newline encountered ends the value |
TEXT(n) | Written as a fixed-length string. The value is padded with NULL bytes (zeros) if necessary. If specified, the delimiter is written after the value and padding. | Reads a fixed-length field n characters wide; however if one of those characters is a NULL byte, the value stored into the table is terminated at that NULL byte. If a delimiter is specified, one additional character is read and discarded. |
VARCHAR(0) | Same as VARCHAR(n), where n is the display length of the corresponding table column. | Read as a variable-length string, preceded by a 5‑character, right-justified length specifier. If a delimiter is specified, additional input is discarded until the delimiter is found. |
VARCHAR(n) where n is 1 to the maximum row size configured, not exceeding 32,000 (16,000 in a UTF8 instance). | Written as a fixed-length string preceded by a 5-character, right‑justified length specifier. If necessary, the value is padded with null characters to the specified length. | Read as a fixed-length string, preceded by a 5-character, right-justified length specifier. If a delimiter is specified, one additional character is read and discarded. |
Note: The dummy format (dn) behaves differently for COPY FROM and COPY INTO. When a table is copied into a file, n specifies the number of times the column name is repeated. When copying from a file to a table, n specifies the number of bytes to skip.
For user-defined data types (UDTs), use CHAR or VARCHAR.
Delimiters in the Data File
Delimiters are those characters in the data file that separate fields and mark the end of records. Valid delimiters are listed in the following table:
When a single character is specified as the delimiter, enclose that character in quotes. If the data type specification is C or D, the quotes must enclose the entire format. For example, 'd0%' specifies a dummy column delimited by a percent sign (%). If the data type specification uses parentheses around the length, quote only the delimiter. For example, char(0)'%' specifies a char field delimited by a percent sign.
Be careful using the sp (space) or null delimiters, especially with COPY FROM. Spaces or nulls are used as padding characters by many of the COPY formats. If a pad character is improperly treated as a delimiter, the COPY FROM will get out of sync with the input, eventually producing an error. When designing a data file format, use delimiters that will not appear in the data or padding, or use CSV or SSV forms.
When copying from a table into a file, you can insert delimiters independently of columns. For example, to insert a newline character at the end of a line, specify nl=d1 at the end of the column list. This directs the DBMS Server to add one (d1) newline (nl) character. (Do not confuse lowercase "l" with the number "1".)
CSV and SSV Delimiters
The CSV and SSV delimiters allow COPY to read and write files that contain comma separated values (CSV).
The rules for a CSV delimited field are:
• The field is delimited by a comma, unless it is the last CSV-delimited field in the COPY list and all following fields are dummy fields; in that case, the field is delimited by a newline.
• COPY FROM: If the first non-blank character in the field is a double quote ("), the field extends until a closing double quote. Commas or newlines inside the quoted string are not delimiters and do not end the value. If a doubled double quote ("") is seen while looking for the closing quote, it is translated to one double quote and the value continues. For example, the data file value:
“There is a double quote “” here”
is translated to the table value:
There is a double quote “ here
Whitespace before the opening double quote, or between the closing double quote and the delimiter (comma or newline), is not part of the value and is discarded.
• COPY INTO: If the value to be written contains a comma, newline, or double quote, it is written enclosed in double quotes using quote doubling as described in the previous bullet item. If the value does not contain a comma, newline, or double quote, it is written as is.
The SSV delimiter works exactly the same as the CSV delimiter, with semicolon in place of comma.
CSV and SSV delimiters are only allowed with BYTE(0), C0, CHAR(0), and TEXT(0). They are not allowed with the “counted” formats (VARCHAR(0) and so on); the count defines the value exactly and there is no need for quoting. (If delimiting is desired, use the comma or nl delimiters on counted formats.)
COPY FROM: Some CSV file variants use quote escaping (\") instead of quote doubling ("") to indicate a quote inside a quoted string. The C format handles\- escaping, so use the C0CSV format and delimiter to handle this type of file. (CSV with COPY INTO always writes quote doubling (never quote escaping) when needed.)
With Null Clause for COPY
When copying data from a table to a file, the with null clause directs copy to put the specified value in the file in place of null fields. You must specify the with null clause for any column that is nullable; if you omit the with null clause, the DBMS Server returns an error when it encounters null data, and aborts the copy statement.
When copying data from a file to a table, the with null clause specifies the value in the file to be interpreted as a null. When copy encounters this value in the file, it writes a null to the corresponding table column.
To prevent conflicts between valid data and null entries, choose a value that does not occur as part of the data in your table. The value you choose to represent nulls must be compatible with the format of the field in the file. Character formats require quoted values and numeric formats require unquoted numeric values.
For example, the following example is incorrect, because the value specified for nulls (numeric zero) conflicts with the character data type of the field:
Wrong:
c0comma with null(0)
The following example, however, is correct:
Right:
c0comma with null("0")
The null value is character data, specified in quotes, and does not conflict with the data type of the field. Do not use the keyword null, quoted or unquoted, for a numeric format.
When copying from a table to a file, be sure that the field format you specify is at least as large as the value you specify for the with null clause. If the column format is too small, the DBMS Server truncates the null value written to the data file to fit the specified format. For example, in the following statement the string "NULL" is truncated to "N" because the format is incorrectly specified as one character:
Wrong:
copy table t1 (col1 = varchar(1) with null ("NULL")) into "t1.dat"
The correct version specifies a 4-character format for the column.
Right:
copy table t1 (col1 = varchar(4) with null ("NULL")) into "t1.dat"
If you specify with null but omit value, copy appends a trailing byte indicating whether the field is null. For null fields, copy inserts an undefined data value in place of the null and sets the trailing byte to indicate a null field. You must specify value for nullable char(0) and varchar(0) columns.