Database Code Page and Client Encoding

As discussed in Concepts and Definitions, above, encoding specifies how character data is translated between the PSQL database engine and a PSQL client application. PSQL handles much of the complexity of the encoding between client and server and the various combinations of operating system, languages, and access method. The encoding enhancements are divided into database code page and client encoding. The two types of encoding are separate but interrelated.

Database code page and client encoding apply only to the Relational Engine. The MicroKernel Engine is not affected.

Database Code Page

The database code page is a database property that specifies the encoding used for character data stored in the database. The default database code page is “server default,” meaning the operating system code page on the server where the database engine is running. (The operating system code page is generally referred to as the “OS encoding,” which is the phrase used throughout this chapter.)

Database code page is particularly handy if you need to manually copy PSQL DDFs to another platform with a different OS encoding and still have the metadata correctly interpreted by the database engine.

When you create a database with PSQL, the default is to use the active code page for the machine on which the database engine is running. For example, on a machine running Windows for English, the code page is assigned to 1252. PSQL encoding translation is set to None. Your application must either use the same code page as the PSQL database, or ensure that encoding translation is set to Automatic.

Supported Code Pages

PSQL supports the following code pages. All of the pages listed use byte string storage.

•

•

•

•

•

•

CP437, CP737, CP775, CP850, CP852, CP855, CP857, CP858, CP862, CP866, CP932

•

CP1250, CP1251, CP1252, CP1253, CP1254, CP1255, CP1256, CP1257, CP1258

Client Encoding

Client encoding is the data encoding used by an application on a PSQL client. An application can manage text in any encoding it chooses. A compatible encoding must be established between the database engine and the client application.

PSQL can automatically translate between different encodings used by the database engine and clients provided the characters are present in both the code page on the server machine and the code page on the client machine.

Data translation, if required, occurs at the client. Translation is not always required – for example, when client and server OS encoding match.

Encoding Support in PCC

You can use PCC to set the database code page when you create a database or to modify the code page setting for an existing database.

Note: Changing the database code page property does not change any data in the database. However, changing the database code page for an existing database will affect how existing data entries are interpreted.

PSQL Control Center (PCC) is, itself, a client application to the database engine. As a client, PCC lets you specify the encoding to use for each database session when PCC reads and inserts metadata and data. The default for an existing database is to use the encoding of the machine where PCC is running. This is the legacy behavior of PCC. The default for a new database is to use automatic translation. See PCC Connection Encoding in PSQL User's Guide.

The following table explains the interaction between the settings for PCC connection encoding and database code page. PCC connection encoding applies only to PCC. It has no effect on other client applications.


PCC Connection Encoding Set to a Specific Encoding	PCC Connection Encoding Set to "Automatic Translation"
PCC ignores the database code page and uses the encoding specified to read and insert CHAR data, string literals, and metadata. NCHAR data is not affected by this setting. (This is the legacy behavior of PCC.)	PCC and the database automatically establish the encoding for CHAR data and metadata. String literals in queries are sent to the engine as Unicode. NCHAR data is not affected by this setting.

Encoding Support in Btrieve API

When using the Btrieve API, you must provide file names and paths in the local encoding used in your application. The Btrieve API handles the differences between OS encoding on the server and client.

Encoding Support in DTI

When using the Distributed Tuning Interface (DTI), you must provide file names and paths in the local encoding used in your application. DTI handles the differences between OS encoding on the server and client.

If you use the DTI API to create a database, you may specify the database code page property at the time of creation. This property may be used by SQL access methods to configure automatic translation of character data.

Encoding Support in ADO.NET

The .NET Framework and .NET applications use UTF-16 strings. These must be translated to a code page when storing text in CHAR columns.

The connection property PVTranslate=Auto sets the connection encoding to the database code page. It is also possible to set the encoding property directly.

For more information, see Adding Connections, PsqlConnectionStringBuilder Object and Character Set Conversions in Data Provider for .NET Guide.

Encoding Support in JDBC

The Java Virtual Machine and Java applications use UTF-16 strings. These must be translated to a code page when storing text in CHAR columns.

The connection property PVTranslate=Auto will set the connection encoding to the database code page. It is also possible to set the encoding property directly.

When the PvTranslate=Auto property is set, the JDBC driver will send string literals to the engine as Unicode. Without this setting, the legacy behavior is to translate string literals to the database code page. If your application uses NCHAR string literals (e.g., “N’ABC’”), it should set the PvTranslate=Auto connection property.

See Connection String Elements in JDBC Driver Guide.

Encoding Support in ODBC

The PSQL ODBC drivers support a number of mechanisms to control client encoding.

When configuring a DSN, it is possible to select the encoding options Automatic, OEM/ANSI, and None. The Automatic setting causes the driver to translate from the client encoding to the database code page. The OEM/ANSI setting causes the driver to translate from the client encoding to the corresponding OEM code page. The None setting prevents the driver from doing any text translation. See Encoding Translation in ODBC Guide for more details.

Legacy Conversion Methods for OEM-to-ANSI Data

If a database has OEM character data in it, a legacy solution is to specify OEM/ANSI conversion in the access method. This topic discusses some legacy methods for Linux clients using OEM character data.

Note: While the legacy methods are still supported, the recommendation is to specify the OEM code page for the database and have the access methods use automatic translation as discussed above.

See also OEM/ANSI Conversion in ODBC Guide.

When using ODBC, Win32 encoding is expected to be SHIFT-JIS.

Japanese versions of Linux by default have their encodings typically set to EUC-JP or UTF-8.

When using Japanese versions of Linux, a client can connect to another Linux server (for example, locally) or to a Win32 SHIFT-JIS server. It is also possible to connect to a database encoded in SHIFT-JIS but located on a Linux server.

Use the following instructions for your listed configuration. In each case, it is assumed that the application itself does not do any conversion and uses the encoding that is native for the machine.

•

Connecting a Linux EUC-JP Client to a Win32 SHIFT-JIS Server

•

Connecting a Linux UTF-8 Client to a Win32 SHIFT-JIS Server

•

Connecting a Linux EUC-JP Client to a Linux EUC-JP Server

•

Connecting a Linux UTF-8 Client to a Linux UTF-8 Server

•

Connecting a Linux UTF-8 Client to a Linux EUC-JP Server

•

Connecting a Linux EUC-JP Client to a Linux EUC-JP Server, with SHIFT-JIS Encoding Used to Store Data on the Server

Connecting a Linux EUC-JP Client to a Win32 SHIFT-JIS Server

The server requires that everything is received as SHIFT-JIS. The client requires that the server send everything as EUC-JP.

To accomplish this, the client DSN settings in ODBC.INI (located by default in /usr/local/psql/etc) used to connect to the given database should be set up as follows:

[dbclient]

Driver=/usr/local/psql/lib/libodbcci.so

Description=PSQL ODBC Client Interface: JPN-2000SERVER:1583/dbclient

ServerDSN=DEMODATA

ServerName=JPN-2000SERVER:1583

TranslationDLL=/usr/local/psql/lib/libxlate.so.10

TranslationOption=90000932

The TranslationDLL line sets the translation library for the ODBC client interface to use.

The TranslationOption line specifies that translation is needed from 9000 (EUC-JP) to 0932 (SHIFT-JIS).

Using this example, all data coming from the client will be translated to SHIFT-JIS before it is sent to the server, and to EUC-JP before the data is sent back to the client.

Connecting a Linux UTF-8 Client to a Win32 SHIFT-JIS Server

The server requires that everything is received as SHIFT-JIS. The client requires that the server send everything as UTF-8.

To accomplish this, the client DSN settings in ODBC.INI (by default in /usr/local/psql/etc) used to connect to the given database should be set up as follows:

[dbclient]

Driver=/usr/local/psql/lib/libodbcci.so

Description=PSQL ODBC Client Interface: JPN-2000SERVER:1583/dbclient

ServerDSN=DEMODATA

ServerName=JPN-2000SERVER:1583

TranslationDLL=/usr/local/psql/lib/libxlate.so.10

TranslationOption=90010932

The TranslationDLL line sets the translation library for the ODBC client interface to use.

The TranslationOption line specifies that translation is needed from 9001 (UTF-8) to 0932 (SHIFT-JIS).

Using this example, all data coming from the client will be translated to SHIFT-JIS before it is sent to the server, and to UTF-8 before the data is sent back to the client.

Connecting a Linux EUC-JP Client to a Linux EUC-JP Server

Using this configuration, no changes to the DSN description are needed. Use the DSN as it was created by the dsnadd utility.

Connecting a Linux UTF-8 Client to a Linux UTF-8 Server

Using this configuration, no changes to the DSN description are needed. Use the DSN as it was created by the dsnadd utility. See dsnadd in PSQL User's Guide.

Connecting a Linux UTF-8 Client to a Linux EUC-JP Server

The server requires that everything is received as EUC-JP. The client requires that server send everything as UTF-8.

To accomplish this, the client DSN settings in ODBC.INI (by default in /usr/local/psql/etc) used to connect to the given database should be set up as follows:

[dbclient]

Driver=/usr/local/psql/lib/libodbcci.so

Description=PSQL ODBC Client Interface: JPN-2000SERVER:1583/dbclient

ServerDSN=DEMODATA

ServerName=JPN-2000SERVER:1583

TranslationDLL=/usr/local/psql/lib/libxlate.so.10

TranslationOption=90019000

The TranslationDLL line sets the translation library for the ODBC client interface to use.

The TranslationOption line specifies that translation is needed from 9001 (EUC-JP) to 9000 (UTF-8).

Using this example, all data coming from the client will be translated to EUC-JP before it is sent to the server, and to UTF-8 before the data is sent back to the client.

Connecting a Linux EUC-JP Client to a Linux EUC-JP Server, with SHIFT-JIS Encoding Used to Store Data on the Server

This situation is possible if you have a SHIFT-JIS database on a Win32 engine, and you want to move all the files to the Linux EUC-JP server. In this case, the database resides on a EUC-JP Linux machine, but all the data inside the DDF files and data files are in SHIFT-JIS.

In this case, your DSN should be set up as follows:

[dbclient]

Driver=/usr/local/psql/lib/libodbcci.so

Description=PSQL ODBC Client Interface: JPN-2000SERVER:1583/dbclient

ServerDSN=DEMODATA

ServerName=JPN-2000SERVER:1583

TranslationDLL=/usr/local/psql/lib/libxlate.so.10

TranslationOption=90000932

CodePageConvert=932

The last line specifies that even though the server uses EUC-JP encoding, it should treat the data on the server as SHIFT-JIS.

Encoding Support for Wide ODBC Driver

PSQL supports UCS-2 with ODBC with a driver for wide character data and defaults for DSN encoding translation. See Encoding Translation and ODBC Connection Strings in ODBC Guide.

ODBC Driver for Applications with Wide Character Data

PSQL provides an ODBC driver for 32-bit and 64-bit applications that use wide character data. The driver is for Windows operating systems only and is an addition to the previous set of drivers.

Table 16

PSQL ODBC Driver for Wide Character Data

Driver Name

Discussion

PSQL ODBC Unicode Interface

•

Connects to a local or remote named database.

•

With the 32-bit ODBC Administrator, creates 32-bit DSNs for use by 32-bit applications that use wide character data. The 32-bit driver is installed with all PSQL editions.

•

With the 64-bit ODBC Administrator, creates 64-bit DSNs for use by 64-bit applications that use wide character data. The 64-bit driver is installed with all PSQL editions when installing on a 64-bit platform.

On Linux, the system encoding is usually UTF-8, which allows SQL text to contain any Unicode character code point. The PSQL ODBC Unicode Interface driver is not available on Linux because an application can use the PSQL ODBC Client Interface driver with UTF-8. A Linux application can handle wide character data either as UTF-16 strings (SQL_C_WCHAR) or request conversion to the system encoding (usually UTF-8) as SQL_C_CHAR. SQL text using UTF-8 is compatible with the existing Pervasive ODBC Client Interface driver so an additional ODBC driver on Linux is not required.

Default for DSN Encoding Translation

The encoding translation options for a DSN specify how character data is translated between the PSQL database engine and a PSQL client application that uses ODBC. The default for encoding translation depends on the PSQL ODBC driver that you use.

Table 17

DSN Encoding Translation Default

Driver Name

Encoding Translation Default

Remarks

PSQL ODBC Unicode Interface

Automatic

The connection string parameter Pvtranslate also defaults to “auto.”

PSQL ODBC Interface

None

Same default as the previous version of PSQL.

PSQL ODBC Client Interface

None

Same default as the previous version of PSQL.

PSQL ODBC Engine Interface

None

Same default as the previous version of PSQL.

The ODBC drivers process SQL text differently depending on the driver and the setting for the DSN encoding translation.

Table 18

PSQL ODBC Driver and DSN Encoding Translation Setting Effect on SQL Text

Setting

Processing of Incoming SQL Text

PSQL Driver

ODBC Unicode Interface

ODBC Interface and ODBC Client Interface

ODBC Engine Interface

Automatic

SQL text gets converted to UTF-8 then sent to the database engine. The code pages for Client, Server, and database are ignored.

Yes1

SQL text gets converted to the database code page then sent to the database engine.

Yes

None

SQL text is not translated between the Client and database engine.2

Yes

OEM/ANSI

SQL text in the Client code page is converted to the OEM/ANSI encoding and then sent to the database engine.

Yes3

1 With the encoding translation set to Automatic, you can use NCHAR columns and NCHAR literals with wide character data.

2 The assumption is that the Client and database engine use the same operating system encoding.

3 If the SQL text is wide character, it is first converted to the Client encoding. If the SQL text is not wide character, it is is already in the Client encoding. The SQL text is then converted to the OEM encoding and sent to the database engine.