Database Code Page and Client Encoding
As discussed in Concepts and Definitions, above, encoding specifies how character data is translated between the PSQL database engine and a PSQL client application. PSQL handles much of the complexity of the encoding between client and server and the various combinations of operating system, languages, and access method. The encoding enhancements are divided into database code page and client encoding. The two types of encoding are separate but interrelated.
Database code page and client encoding apply only to the Relational Engine. The MicroKernel Engine is not affected.
Database Code Page
The database code page is a database property that specifies the encoding used for character data stored in the database. The default database code page is “server default,” meaning the operating system code page on the server where the database engine is running. (The operating system code page is generally referred to as the “OS encoding,” which is the phrase used throughout this chapter.)
Database code page is particularly handy if you need to manually copy PSQL DDFs to another platform with a different OS encoding and still have the metadata correctly interpreted by the database engine.
When you create a database with PSQL, the default is to use the active code page for the machine on which the database engine is running. For example, on a machine running Windows for English, the code page is assigned to 1252. PSQL encoding translation is set to None. Your application must either use the same code page as the PSQL database, or ensure that encoding translation is set to Automatic.
Supported Code Pages
PSQL supports the following code pages. All of the pages listed use byte string storage.
Client Encoding
Client encoding is the data encoding used by an application on a PSQL client. An application can manage text in any encoding it chooses. A compatible encoding must be established between the database engine and the client application.
PSQL can automatically translate between different encodings used by the database engine and clients provided the characters are present in both the code page on the server machine and the code page on the client machine.
Data translation, if required, occurs at the client. Translation is not always required – for example, when client and server OS encoding match.
Encoding Support in PCC
You can use PCC to set the database code page when you create a database or to modify the code page setting for an existing database.
*Note: Changing the database code page property does not change any data in the database. However, changing the database code page for an existing database will affect how existing data entries are interpreted.
PSQL Control Center (PCC) is, itself, a client application to the database engine. As a client, PCC lets you specify the encoding to use for each database session when PCC reads and inserts metadata and data. The default for an existing database is to use the encoding of the machine where PCC is running. This is the legacy behavior of PCC. The default for a new database is to use automatic translation. See PCC Connection Encoding in PSQL User's Guide.
The following table explains the interaction between the settings for PCC connection encoding and database code page. PCC connection encoding applies only to PCC. It has no effect on other client applications.
Encoding Support in Btrieve API
When using the Btrieve API, you must provide file names and paths in the local encoding used in your application. The Btrieve API handles the differences between OS encoding on the server and client.
Encoding Support in DTI
When using the Distributed Tuning Interface (DTI), you must provide file names and paths in the local encoding used in your application. DTI handles the differences between OS encoding on the server and client.
If you use the DTI API to create a database, you may specify the database code page property at the time of creation. This property may be used by SQL access methods to configure automatic translation of character data.
Encoding Support in ADO.NET
The .NET Framework and .NET applications use UTF-16 strings. These must be translated to a code page when storing text in CHAR columns.
The connection property PVTranslate=Auto sets the connection encoding to the database code page. It is also possible to set the encoding property directly.
For more information, see Adding Connections, PsqlConnectionStringBuilder Object and Character Set Conversions in Data Provider for .NET Guide.
Encoding Support in JDBC
The Java Virtual Machine and Java applications use UTF-16 strings. These must be translated to a code page when storing text in CHAR columns.
The connection property PVTranslate=Auto will set the connection encoding to the database code page. It is also possible to set the encoding property directly.
When the PvTranslate=Auto property is set, the JDBC driver will send string literals to the engine as Unicode. Without this setting, the legacy behavior is to translate string literals to the database code page. If your application uses NCHAR string literals (e.g., “N’ABC’”), it should set the PvTranslate=Auto connection property.
See Connection String Elements in JDBC Driver Guide.
Encoding Support in ODBC
The PSQL ODBC drivers support a number of mechanisms to control client encoding.
When configuring a DSN, it is possible to select the encoding options Automatic, OEM/ANSI, and None. The Automatic setting causes the driver to translate from the client encoding to the database code page. The OEM/ANSI setting causes the driver to translate from the client encoding to the corresponding OEM code page. The None setting prevents the driver from doing any text translation. See Encoding Translation in ODBC Guide for more details.
Legacy Conversion Methods for OEM-to-ANSI Data
If a database has OEM character data in it, a legacy solution is to specify OEM/ANSI conversion in the access method. This topic discusses some legacy methods for Linux clients using OEM character data.
*Note: While the legacy methods are still supported, the recommendation is to specify the OEM code page for the database and have the access methods use automatic translation as discussed above.
See also OEM/ANSI Conversion in ODBC Guide.
When using ODBC, Win32 encoding is expected to be SHIFT-JIS.
Japanese versions of Linux by default have their encodings typically set to EUC-JP or UTF-8.
When using Japanese versions of Linux, a client can connect to another Linux server (for example, locally) or to a Win32 SHIFT-JIS server. It is also possible to connect to a database encoded in SHIFT-JIS but located on a Linux server.
Use the following instructions for your listed configuration. In each case, it is assumed that the application itself does not do any conversion and uses the encoding that is native for the machine.
Connecting a Linux EUC-JP Client to a Win32 SHIFT-JIS Server
The server requires that everything is received as SHIFT-JIS. The client requires that the server send everything as EUC-JP.
To accomplish this, the client DSN settings in ODBC.INI (located by default in /usr/local/psql/etc) used to connect to the given database should be set up as follows:
[dbclient]
Driver=/usr/local/psql/lib/libodbcci.so
Description=PSQL ODBC Client Interface: JPN-2000SERVER:1583/dbclient
ServerDSN=DEMODATA
ServerName=JPN-2000SERVER:1583
TranslationDLL=/usr/local/psql/lib/libxlate.so.10
TranslationOption=90000932
The TranslationDLL line sets the translation library for the ODBC client interface to use.
The TranslationOption line specifies that translation is needed from 9000 (EUC-JP) to 0932 (SHIFT-JIS).
Using this example, all data coming from the client will be translated to SHIFT-JIS before it is sent to the server, and to EUC-JP before the data is sent back to the client.
Connecting a Linux UTF-8 Client to a Win32 SHIFT-JIS Server
The server requires that everything is received as SHIFT-JIS. The client requires that the server send everything as UTF-8.
To accomplish this, the client DSN settings in ODBC.INI (by default in /usr/local/psql/etc) used to connect to the given database should be set up as follows:
[dbclient]
Driver=/usr/local/psql/lib/libodbcci.so
Description=PSQL ODBC Client Interface: JPN-2000SERVER:1583/dbclient
ServerDSN=DEMODATA
ServerName=JPN-2000SERVER:1583
TranslationDLL=/usr/local/psql/lib/libxlate.so.10
TranslationOption=90010932
The TranslationDLL line sets the translation library for the ODBC client interface to use.
The TranslationOption line specifies that translation is needed from 9001 (UTF-8) to 0932 (SHIFT-JIS).
Using this example, all data coming from the client will be translated to SHIFT-JIS before it is sent to the server, and to UTF-8 before the data is sent back to the client.
Connecting a Linux EUC-JP Client to a Linux EUC-JP Server
Using this configuration, no changes to the DSN description are needed. Use the DSN as it was created by the dsnadd utility.
Connecting a Linux UTF-8 Client to a Linux UTF-8 Server
Using this configuration, no changes to the DSN description are needed. Use the DSN as it was created by the dsnadd utility. See dsnadd in PSQL User's Guide.
Connecting a Linux UTF-8 Client to a Linux EUC-JP Server
The server requires that everything is received as EUC-JP. The client requires that server send everything as UTF-8.
To accomplish this, the client DSN settings in ODBC.INI (by default in /usr/local/psql/etc) used to connect to the given database should be set up as follows:
[dbclient]
Driver=/usr/local/psql/lib/libodbcci.so
Description=PSQL ODBC Client Interface: JPN-2000SERVER:1583/dbclient
ServerDSN=DEMODATA
ServerName=JPN-2000SERVER:1583
TranslationDLL=/usr/local/psql/lib/libxlate.so.10
TranslationOption=90019000
The TranslationDLL line sets the translation library for the ODBC client interface to use.
The TranslationOption line specifies that translation is needed from 9001 (EUC-JP) to 9000 (UTF-8).
Using this example, all data coming from the client will be translated to EUC-JP before it is sent to the server, and to UTF-8 before the data is sent back to the client.
Connecting a Linux EUC-JP Client to a Linux EUC-JP Server, with SHIFT-JIS Encoding Used to Store Data on the Server
This situation is possible if you have a SHIFT-JIS database on a Win32 engine, and you want to move all the files to the Linux EUC-JP server. In this case, the database resides on a EUC-JP Linux machine, but all the data inside the DDF files and data files are in SHIFT-JIS.
In this case, your DSN should be set up as follows:
[dbclient]
Driver=/usr/local/psql/lib/libodbcci.so
Description=PSQL ODBC Client Interface: JPN-2000SERVER:1583/dbclient
ServerDSN=DEMODATA
ServerName=JPN-2000SERVER:1583
TranslationDLL=/usr/local/psql/lib/libxlate.so.10
TranslationOption=90000932
CodePageConvert=932
The last line specifies that even though the server uses EUC-JP encoding, it should treat the data on the server as SHIFT-JIS.
Encoding Support for Wide ODBC Driver
PSQL supports UCS-2 with ODBC with a driver for wide character data and defaults for DSN encoding translation. See Encoding Translation and ODBC Connection Strings in ODBC Guide.
ODBC Driver for Applications with Wide Character Data
PSQL provides an ODBC driver for 32-bit and 64-bit applications that use wide character data. The driver is for Windows operating systems only and is an addition to the previous set of drivers.
On Linux, the system encoding is usually UTF-8, which allows SQL text to contain any Unicode character code point. The PSQL ODBC Unicode Interface driver is not available on Linux because an application can use the PSQL ODBC Client Interface driver with UTF-8. A Linux application can handle wide character data either as UTF-16 strings (SQL_C_WCHAR) or request conversion to the system encoding (usually UTF-8) as SQL_C_CHAR. SQL text using UTF-8 is compatible with the existing Pervasive ODBC Client Interface driver so an additional ODBC driver on Linux is not required.
Default for DSN Encoding Translation
The encoding translation options for a DSN specify how character data is translated between the PSQL database engine and a PSQL client application that uses ODBC. The default for encoding translation depends on the PSQL ODBC driver that you use.
The ODBC drivers process SQL text differently depending on the driver and the setting for the DSN encoding translation.
1 With the encoding translation set to Automatic, you can use NCHAR columns and NCHAR literals with wide character data.
2 The assumption is that the Client and database engine use the same operating system encoding.
3 If the SQL text is wide character, it is first converted to the Client encoding. If the SQL text is not wide character, it is is already in the Client encoding. The SQL text is then converted to the OEM encoding and sent to the database engine.