Multilingual Database Support With Unicode UTF-8
If you choose to store text as UTF-8 you will continue to use the CHAR, VARCHAR, and LONGVARCHAR relational types. You also need to consider such aspects as the Unicode support for the operating system on which your application runs, the string manipulation libraries available to your application, the PSQL access methods your application uses, any columns that may need a different data type, and so forth.
When to Use Unicode UTF-8
Unicode UTF-8 encoding is a good choice for the following:
Unicode UTF-8 Support in PSQL
One of the code pages supported by PSQL is UTF-8. For UTF-8 text storage, you would set the DB code page for your PSQL database to UTF-8.
Note that with UTF-8, string storage is byte strings. For byte strings, PSQL provides the relational data types CHAR, VARCHAR, and LONGVARCHAR, and the Btrieve data types STRING and ZSTRING. See also Data Types in SQL Engine Reference. Columns will likely be wider when storing UTF-8 because European languages often require two bytes per character instead of a single byte for legacy code pages.
All string data inserted by your application for existing CHAR, VARCHAR and LONGVARCHAR data types are interpreted as UTF-8 strings. You can configure the PSQL SQL access methods to automatically translate to UTF-8 (see Access Methods for Unicode UTF-8 Support).
When the database code page is UTF-8 and the client environment supports Unicode (wide character or UTF-8), SQL text supports Unicode characters in CHAR literals. With any other database code page, general Unicode characters must be in NCHAR literals.
Collation and Sorting
PSQL supports only code point order for collation and sorting with UTF-8 storage.
Access Methods for Unicode UTF-8 Support
The PSQL access methods ODBC, JDBC, and ADO.NET support translation to UTF-8 storage. These access methods exchange text values with the application as UCS-2 wide character strings or as legacy byte strings for the ANSI ODBC drivers. When properly configured, the access methods translate the application text values to UTF-8 for transmission to the storage engine.
Migrating an Existing Database to Unicode UTF-8
All text data must be converted from any legacy code page to UTF-8. Columns will likely need to be widened to accommodate the longer UTF-8 byte strings. Any non-ASCII metadata, such as table names, must be converted from the legacy code page to UTF-8. Given these combined changes, it is reasonable to migrate the database by copying from the old schema, using the legacy code page, to the new schema with UTF-8 as the database code page.
Note In the special case where all existing data and metadata is pure ASCII, it is possible to just change the database code page to UTF-8.
All existing (7-bit) ASCII byte strings are also valid UTF-8 byte strings.