Support for Collation and Sorting

Support for Collation and Sorting

What is Collation and Sorting?

Collation refers to the ordering of the characters in a character set. For example, one collation might put digits before letters and another might put them after. Sorting is the rearrangement of a set of data so that the text is in collation order.

PSQL supports the specification of a named collation on byte-string text segments. Indexes will sort the record keys according to the specified collation.

Sort Order with No Collation Sequence Specified

When no collation is specified, PSQL will use a default collation that orders characters according to the value of their encoding. Thus the relative ordering of two characters depends on which code page is in use. The default is ascending order, meaning from lowest code page value to highest. You can optionally set this to descending order. See Sort Order in PSQL Programmer’s Guide for more information.

Collation Support in Wide Character Columns

PSQL supports the default collation of Unicode data according to code point value. In addition to working with UTF-16, it also can sort multibyte UTF-8 text in code point order.

Collation Support Using an Alternate Collating Sequence (ACS)

You can specify an alternative to the default code page collation order. This user-defined alternate collating sequence or ACS is a mapping between the code page collation order and the desired collation order. You can define one or more alternate sequences for determining the collation of string keys of type STRING, LSTRING, and ZSTRING. For example, you can use a user-defined ACS to specify a collation that places numbers after letters or changes the ordering of upper and lower case letters. PSQL comes with an ACS, named UPPER.ALT, that maps the lower case letters to sort as equivalent to uppercase letters. (This could also be achieved by setting case insensitivity but shows what can be done with an ACS.)

Essentially, the user-defined ACS is a table that associates the code page sequence position for a character with the alternate desired sequence position. Creating an ACS is described in Alternate Collating Sequences in PSQL Programmer’s Guide. Examples are provided there, also. You specify the ACS for key value fields in the definition of the layout of the data file (see Specifying a Key’s Alternate Collating Sequence (in this guide) and Data Layout in PSQL Programmer’s Guide).

For additional information about setting an ACS, see Create (14), Create Index (31) and Get Next Extended (36) in Btrieve API Guide, Alternate Collating Sequence (ACS) Files in DDF Builder User’s Guide and SET DEFAULTCOLLATE in SQL Engine Reference.

Collation Support Using an International Sort Rule (ISR)

Another type of ACS is an international sort rule or ISR. An ISR is a predefined alternate collating sequence for language specific sort orders. You can use an ISR to correctly sort languages such as German with the letters ä, ö, ü (sorted as ae, oe, ue) and ß (sorted as ss). PSQL provides a number of ISR tables in the COLLATE.CFG file in your PSQL installation. Examples of their use can be found in Sample Collations Using International Sorting Rules in PSQL Programmer’s Guide. See the references for alternate collating sequences, above, for more information.

Collation Support Using an ICU Unicode Collation

At the Btrieve level, PSQL supports two Unicode collations for use with UTF-8 or UTF-16 data if you need sorting other than the default binary collation. These alternate Unicode collations are based on the International Components for Unicode (ICU) libraries, release version 54. The following table summarizes the collations.


ICU Collation Name	Installed File	Description
u54-msft_enus_0	u54-msft_enus_0.txt	Emulates the ISR collation MSFT_ENUS01252_0. The emulation applies only to the 1252 subset of Unicode. Characters outside this range are sorted according to the ICU root collation.
root	icudt54l.dat	Defines default ICU collation and other configuration data.

ICU collations are used like ISRs, except that instead of having names starting with PVSW_ or MSFT_, their names must be either root or have the prefix u54-. In addition, they can be applied only to the following Unicode data types:

•

STRING (assumed to be UTF-8)

•

ZSTRING (assumed to be UTF-8)

•

•

When the Btrieve file is created, the application specifies one of the two ICU collation names in the same manner as an ISR name.

To use ICU, PSQL requires configuration data for generic sorting and supplemental data for locale-specific collations. PSQL installs this data in two files in the same location as the collate.cfg file. The configuration data resides in icudt54l.dat. PSQL supports the default ICU collation called root. PSQL provides a second collation in u54-msft_enus_0.txt.

For more information about ICU collations, see the ICU Project website.