UTF8 Character Set

E. Features Introduced in Ingres 9.2 : DBMS Server Enhancements : UTF8 Character Set

Share this page

UTF8 Character Set

Ingres supports the UTF8 character set, which lets you store multi-byte UTF-8 encoded Unicode characters into char, varchar, and long varchar strings. The UTF8 character set can be selected during installation.

Support for the UTF8 character set provides compatibility and portability with other database architectures.

Clients installed with the UTF8 character set can connect only to a DBMS Server that uses the UTF8 character set. If UTF8 is the character set for the server, then all clients connecting to this server must also use the UTF8 character set.

If the server character set is UTF8, then by default any database that is created on the server is created Unicode-enabled with Normalization Form C (NFC) with default UNICODE collation, even if it is not explicitly defined. Thus char, varchar, and long varchar columns (as well as nchar, nvarchar, and long nvarchar) use the UNICODE collation by default.

If the database you are connecting to is Unicode-enabled, the UNICODE collation is loaded.

The collation sequence UNICODE_FRENCH is added to support French Unicode collation.

Only char, varchar, and long varchar columns support UTF-8. Ingres character based tools (such as the terminal monitor and ABF) show the data in UTF-8.

Note: When creating a table in an installation set to the UTF8 character set, the column specification for char and varchar columns is in number of bytes (not number of characters).

String functions—such as length(), substring(), and position()—operate similarly on UTF-8 strings.

Coercion is supported between different string types and between string types and other Ingres data types (numeric, datetime, binary, and so on) in a UTF-8 database.

More information can be found in the Installation Guide, in the SQL Reference Guide under storage formats, and in the Command Reference Guide under the createdb and unloaddb command descriptions.