Custom Collation Sequence

If you have special needs that are not met by the available collation sequences, you can write your own. Ingres allows you to write a collation sequence that has any of the following characteristics:

• Character skipping—one or more specified characters are ignored for collation

• One-to-one mapping—a character can be substituted for another or weighted differently for collation

• Many-to-one mapping—groups of characters can be substituted for a single character or weight value for collation

• Many-to-many mapping—groups of characters can be substituted for a sequence of characters or weight values for collation

Keep the following points in mind as you design and test your custom collation file:

• Never create a production database with an untested collation sequence. Always test your collation file on a sample database. Each time that you modify the collation sequence to correct any bugs, you must unload the database, destroy the old database, install the new sequence, create a database with the new sequence, and reload the database.

• Some collation sequences allow two strings that are different to compare as equal. These sequences are called information loss sequences. An example of this type of sequence is a sequence that ignores case.
Problems that can result from such a sequence are:

• If duplicates are not allowed, the DBMS drops all but one string.

• If duplicates are not allowed, the DBMS does not allow you to add a row to a table if it appears to match an existing row.

• The hash storage structure cannot detect when two equal but different strings are placed in a hashed relation.

• In a query on a hash table, the “=” operator can only fetch one of the ‘equal’ strings that matches the expression.

Because of these problems, we suggest that you do not use information loss sequences and the hash storage structure together.

To define a custom collation sequence you must create a description file, which consists of a list of “instructions” that, taken as a whole, describe the collation sequence. Each instruction must appear on a separate line in the file.

Determines the numerical weight assigned to string. (The internal numerical weight of each character determines where a character appears in the sort order.)

Instructs sorting of the specified string after the specified character and before the next higher-weighted character in the character set. For example, in the following instruction, string1 is mapped as a single character that is ordered immediately after the letter H and before I in a sorted sequence:

In the following instruction, string2 sorts after string1 and before the letter I.

You can specify H+1:string or Hz+1:string and both sorts in the same manner, that is, after H and before I. However, the two examples do not behave the same when pattern matching is applied. To illustrate using an example from the Spanish language, the following instruction maps CH as a single character that exists between C and D:

If you ask for a pattern match using the format C%, instances of CH are not returned. The alternative, Cz+1:CH, maps CH into two characters, C and a virtual character just after z. This causes CH to match as two characters. A pattern match using the format C% finds the instances of CH.

Sorts the specified string as the equivalent of the specified charstring. For example, in the following instruction, the word tax sorts as if it were the word revenue:

Gives the specified string the internal numerical weight specified by given number. The number must be between 0 and 32766. The weighting of a character in this manner is less portable than giving the character a relative weight.

Causes the specified string to be ignored when collation is performed. For example, in the following instruction, the “?”is ignored whenever collation takes place.

When no value is specified (the instruction takes the form: string), the collation compiler ignores the instruction. Use this format to insert comments into your collation sequence. For example:

Is any character or character string. An empty string causes a syntax error.

The aducompile utility compiles the description file for your collation sequence into a binary file and installs that file as a collation sequence that can be used. You must be the installation owner to use this utility. Be sure to give your resulting collation file a unique name so that you do not overwrite any existing collation files.

Your new collation sequence is located at $II_SYSTEM/ingres/files/collation/collation_name.

Note: In UNIX, all system users must have rights to read the new collation file.