Data Encoding
An encoding is a standard for representing character sets. Character data must be put in a standard format, that is, encoded, so that a computer can process it digitally. An encoding must be established between the Pervasive PSQL database engine (server) and a Pervasive PSQL client application. A compatible encoding allows the server and client to interpret data correctly.
Pervasive PSQL v11 SP3 better handles the complexity of the encoding between client and server and the various combinations of operating system, languages, and access method. The encoding enhancements are divided into database code page and client encoding. The two types of encoding are separate but interrelated (see Table 8).
The use of the two encoding methods is intended for advanced users. In general, the default encoding settings are sufficient and do not require changing.
Database code page and client encoding apply only to the relational interface. The transactional interface is not affected.
This section contains the following topics:
Database Code Page
Database code page is specified with a new property called database code page, which identifies the encoding to use for data and metadata. The default database code page is “server default,” meaning the operating system (OS) code page on the server where the database engine is running. (The OS code page is generally referred to as the “OS encoding,” which is the phrase used throughout the rest of this chapter.)
Database code page is particularly handy if you need to manually copy Pervasive PSQL DDFs to another platform with a different OS encoding and still have the metadata correctly interpreted by the database engine.
Client Encoding
Client encoding is the data encoding used by an application on a Pervasive PSQL client. An application can store data in any encoding it chooses. But, as mentioned earlier, a compatible encoding must be established between the database engine and the client application. Previous versions of Pervasive PSQL provided methods to ensure compatible encoding between the database engine and clients.
Those methods have been enhanced to take advantage of database code page. An application can now specify that it wants the Pervasive PSQL client to translate data automatically between the database code page and the client application. This is referred to as automatic translation. Note, however, that automatic translation can translate characters only if they are present in both code pages (the code page on the server machine and the code page on the client machine).
Automatic translation is specified when the client application connects to the database engine. See ODBC Connection Strings in SQL Engine Reference.
Data translation, if required, occurs at the client. (Translation is not always required—for example, when the client operating system (OS) encoding matches the server OS encoding.)
Encoding Interaction
The following table explains the interaction between database code page and client encoding.
When a database has OEM character data in it, the legacy solution was for the access method, such as ODBC using a DSN, to specify OEM/ANSI conversion. Now it is possible to set the OEM code page for the database and have the access method specify automatic translation. See also Encoding Translation in SQL Engine Reference.
*Note: The database engine does not validate the encoding of the data and metadata that an application inserts into a database. The engine assumes that all data was entered using the database code page as explained in Table 8.
Legacy Conversion Methods for OEM Data
If a database has OEM character data in it, a legacy solution is to specify OEM/ANSI conversion in the access method. This topic discusses some legacy methods for Linux clients using OEM character data.
*Note: While the legacy methods are still supported, the recommendation is to specify the OEM code page for the database and have the access methods use automatic translation as discussed above.
Btrieve and DTI
When using the Btrieve API or the Distributed Tuning Interface (DTI), you must provide file names and paths in the local encoding used in your application. The Btrieve API and DTI handle the differences between OS encoding on the server and client.
ODBC
See also OEM/ANSI Conversion in SQL Engine Reference.
When using ODBC, Win32 encoding is expected to be SHIFT-JIS.
Japanese versions of Linux by default have their encodings typically set to EUC-JP or UTF-8.
When using Japanese versions of Linux, a client can connect to another Linux server (for example, locally), or to a Win32 SHIFT-JIS server. It is also possible to connect to a database encoded in SHIFT-JIS but located on a Linux server.
Use the following instructions for your listed configuration. In each case, it is assumed that the application itself does not do any conversion and uses the encoding that is native for the machine.
Connecting a Linux EUC-JP Client to a Win32 SHIFT-JIS Server
The server requires that everything is received as SHIFT-JIS. The client requires that the server send everything as EUC-JP.
To accomplish this, the client DSN settings in ODBC.INI (located by default in /usr/local/psql/etc) used to connect to the given database should be set up as follows:
[dbclient]
Driver=/usr/local/psql/lib/libodbcci.so
Description=Pervasive ODBC Client Interface: JPN-2000SERVER:1583/dbclient
ServerDSN=DEMODATA
ServerName=JPN-2000SERVER:1583
TranslationDLL=/usr/local/psql/lib/libxlate.so.10
TranslationOption=90000932
The TranslationDLL line specifies the translation library that the ODBC client interface should use.
The TranslationOption line specifies that the translation needs to occur from 9000 (representing EUC-JP) to 0932 (representing SHIFT-JIS).
Using this example, all data coming from the client will be translated into SHIFT-JIS before it gets to the server, and to EUC-JP before the data is received by the client.
Connecting a Linux UTF-8 Client to a Win32 SHIFT-JIS Server
The server requires that everything is received as SHIFT-JIS. The client requires that the server send everything as UTF-8.
To accomplish this, the client DSN settings in ODBC.INI (by default in /usr/local/psql/etc) used to connect to the given database should be set up as follows:
[dbclient]
Driver=/usr/local/psql/lib/libodbcci.so
Description=Pervasive ODBC Client Interface: JPN-2000SERVER:1583/dbclient
ServerDSN=DEMODATA
ServerName=JPN-2000SERVER:1583
TranslationDLL=/usr/local/psql/lib/libxlate.so.10
TranslationOption=90010932
The TranslationDLL line specifies the translation library that the ODBC client interface should use.
The TranslationOption line specifies that the translation needs to occur from 9001 (representing UTF-8) to 0932 (representing SHIFT-JIS).
Using this example, all data coming from the client will be translated into SHIFT-JIS before it gets to the server, and to UTF-8 before the data is received by the client.
Connecting a Linux EUC-JP Client to a Linux EUC-JP Server
Using this configuration, no changes to the DSN description are needed. Use the DSN as it was created by the dsnadd utility.
Connecting a Linux UTF-8 Client to a Linux UTF-8 Server
Using this configuration, no changes to the DSN description are needed. Use the DSN as it was created by the dsnadd utility. See dsnadd in Pervasive PSQL User's Guide.
Connecting a Linux UTF-8 Client to a Linux EUC-JP Server
The server requires that everything is received as EUC-JP. The client requires that server send everything as UTF-8.
To accomplish this, the client DSN settings in ODBC.INI (by default in /usr/local/psql/etc) used to connect to the given database should be set up as follows:
[dbclient]
Driver=/usr/local/psql/lib/libodbcci.so
Description=Pervasive ODBC Client Interface: JPN-2000SERVER:1583/dbclient
ServerDSN=DEMODATA
ServerName=JPN-2000SERVER:1583
TranslationDLL=/usr/local/psql/lib/libxlate.so.10
TranslationOption=90019000
The TranslationDLL line specifies the translation library that the ODBC client interface should use.
The TranslationOption line specifies that the translation needs to occur from 9001 (representing UTF-8) to 9000 (representing EUC-JP).
Using this example, all data coming from the client will be translated into EUC-JP before it gets to the server, and to UTF-8 before the data is received by the client.
Connecting a Linux EUC-JP Client to a Linux EUC-JP Server, with SHIFT-JIS Encoding Used to Store Data on the Server
This situation is possible if you have a SHIFT-JIS database on a Win32 engine, and you want to move all the files to the Linux EUC-JP server. In this case, the database resides on a EUC-JP Linux machine, but all the data inside the DDF files and data files are in SHIFT-JIS.
In this case, your DSN should be set up as follows:
[dbclient]
Driver=/usr/local/psql/lib/libodbcci.so
Description=Pervasive ODBC Client Interface: JPN-2000SERVER:1583/dbclient
ServerDSN=DEMODATA
ServerName=JPN-2000SERVER:1583
TranslationDLL=/usr/local/psql/lib/libxlate.so.10
TranslationOption=90000932
CodePageConvert=932
The last line specifies that even though the server uses EUC-JP encoding, it should treat the data on the server as SHIFT-JIS.