8. SQL Statements : COPY : Column Formats for COPY : Unicode Formats
 
Share this page                  
Unicode Formats
The Unicode formats are NCHAR, NVARCHAR, and LONG NVARCHAR and they can only be used with NCHAR or NVARCHAR table columns. Fixed-length forms are NCHAR(n) and NVARCHAR(n). Variable-length forms are NCHAR(0), NVARCHAR(0), and LONG NVARCHAR(0).
Fixed-length NCHAR(n) and NVARCHAR(n) formats read and write using the two-byte UCS-2 encoding. The variable-length NCHAR(0), NVARCHAR(0), and LONG NVARCHAR(0) forms read and write using the variable-length UTF8 encoding.
The field length n for NCHAR(n) and NVARCHAR(n) should be specified as character lengths, not byte (octet) lengths. However, the embedded length specifier used by the NCHAR(0) and NVARCHAR(0) formats should give the number of bytes, not characters. (The reason is that NCHAR(0) and NVARCHAR(0) use the UTF8 encoding, which encodes Unicode code points into a variable number of bytes. COPY needs the byte count to know how many bytes to read and decode from UTF8.)