String Functions

Note: Using + should be avoided. It is an overloaded operator and can result in ambiguity when string columns and literals that contain only numeric values are mixed with numeric columns or literals.

Result type: Any character type

Returns the character equivalent of the value v1, which is an expression of any type.

Returns the nth byte of c1. If n is larger than the length of the string, the result is a byte of ASCII 0.

Note: This function does not support LONG VARCHAR, LONG NVARCHAR, or LONG BYTE arguments.

Result type: INTEGER

Returns the number of characters in c1 without trimming blanks, as is done by the LENGTH() function.

Result type: VARCHAR or NCHAR

Returns the nth character or code point of c1. If n is larger than the length of the string, the result is a blank character.

Note: For Unicode (USC2) strings the value returned is the single USC2 character. For non-Unicode strings the values returned may be 1,2,3 or 4 bytes long depending on the character at offset n characters of c1.

SELECT CHAREXTRACT('company',4) returns 'p'

Converts integer into corresponding ASCII code. If n is greater than 255, the conversion is performed on n mod 256.

SELECT CHR(65) returns 'A'.

SELECT CHR(345) returns 'Y'.

Result type: VARBYTE

Returns the collation weight of any char, c, varchar, text, nchar, or nvarchar value c1. n1 is an optional collation ID when the collation weight is desired relative to a specific collation.

Result type: Any character or Unicode type, BYTE

Concatenates two or more strings.

SELECT CONCAT('1', '2', '3', '4') returns '1234'.

In earlier releases, to achieve the same result, you must nest the CONCAT function: SELECT CONCAT(CONCAT(CONCAT('1', '2'), '3'), '4').

The result size is the sum of the sizes of the arguments. If the result is a c or char string, it is padded with blanks to achieve the proper length. To determine the data type results of concatenating strings, see the table regarding results of string concatenation.

Note: The concatenation of the BYTE data type cannot be used to create a table column implicitly in Ingres tables; the result of such concatenation, however, can be inserted into a column with a character data type.

CREATE TABLE CONCAT2 AS SELECT CONCAT(BYTE('1'), 0x42, X'43', CAST('4' AS BYTE));

INSERT INTO test_concat SELECT CONCAT(BYTE('1'), 0x42, X'43', CAST('4' AS BYTE));

Result type: VARCHAR(2)

Generates the check digit for the string c1 using the specified encoding scheme.

Supported schemes are described under VALIDATE_DIGIT (see VALIDATE_DIGIT).

The check digit is computed from the other characters in the string. Although the majority of check digits are a single character, some encoding schemes may return a two character check digit. Typically the check digit is appended to the string for validation, but this is not the case in all encoding schemes.

The check digit can be used to help determine if a string has been entered correctly. A check digit may be able to detect several types of errors, including:

• Phonetic errors--the mixing of similar sounding numbers (sixty '60' not sixteen '16')

A check digit improves the chance of detecting errors in keying, but is not an infallible error detector. Furthermore, a check digit does not provide error correction capability.

There are many check digit schemes, each with its own purpose, capabilities, and complexity. For example, the Luhn algorithm is widely used in credit card numbers and is programmatically simple to implement. The Verhoeff algorithm, in contrast, is better at detecting certain error types such as jump transpositions, but is more complicated.

The scheme name is case insensitive. When specifying the LUHN or VERHOEFF scheme, the function can take as input a character string of any length. The other schemes are fixed length, so the input string must be of the correct length.

Note: This function does not support NCHAR and NVARCHAR strings as arguments.

Result type: VARBYTE

Converts an IPv4 (dotted-decimal) or IPv6 (coloned-hex) address to a varbyte data type of length 4 or 16 respectively. Each number between the periods of an IPv4 address is converted into a single byte in the hex result. For IPv6, each hex number between the colons is converted to two bytes in the hex result. Double-colons and IPv4-mapped IPv6 addresses are supported as well.

Views and database procedures created prior to Ingres 9.2 will continue to exhibit the old behavior until they are recreated. Previously, only IPv4 addresses were supported and the returned value was type=byte with length=4.

Result type: Any character or Unicode type

Converts all initial characters in c1 to upper case.

SELECT INITCAP('This is the final version (VERSION:5.a;6) of Leonard''s will')

returns:'This Is The Final Version (Version:5.A;6) Of Leonard's Will'

Result type: Any byte, character, or Unicode type

Returns the leftmost len characters of c1. If the result is a fixed-length c or char string, it is the same length as c1, padded with blanks. The result format is the same as c1. len will be converted to a positive integer.

If len is not an integer value it will be rounded down equivalent to floor(len). If len is negative the result will be an empty string.

SELECT LEFT ('Company',4) returns 'Comp'

Result type: SMALLINT (for LONG VARCHAR, returns INTEGER4)

If c1 is a fixed-length c or char string, returns the length of c1 without trailing blanks. If c1 is a variable-length string, returns the number of characters actually in c1.

Result type: SMALLINT

Returns the location of the first occurrence of c2 within c1, including trailing blanks from c2. The location is in the range 1 to size(c1). If c2 is not found, the function returns size(c1) + 1. The function size() is described below, in this table.

If c1 and c2 are different string data types, c2 is coerced into the c1 data type.

SELECT LOCATE ('Company', 'p') returns 4

Result type: Any character or Unicode type

Converts all upper case characters in c1 to lower case.

SELECT LOWER ('Company') returns 'company'

Result type: Any character type

Returns character expression of length n in which expr1 is prepended by n-m blanks (where m is length(expr1)) or, if expr2 is coded, enough copies of expr2 to fill n-m positions at the start of the result string.

SELECT LPAD ('Company',20, '-') returns '-------------Company'

Result type: Any character type

Returns character expression with leading blanks removed.

SELECT LTRIM (' Company') returns 'Company'

Result type: CHAR(32)

Returns the calculated MD5 message digest (also known as md5sum) of c1. The computed sum generated is a lowercase hexadecimal string.

SELECT MD5SUM('a') returns '0cc175b9c0f1b6a831c399e269772661'.

Result type: Any character string variable

Retains trailing blanks when placing a value in a varchar column. This function can be used only in an embedded SQL program. For more information, see the Embedded SQL Companion Guide.

Result type: INTEGER

Returns the number of 8-bit octets (bytes) in c1 without trimming blanks, as is done by the LENGTH() function.

CREATE TABLE octet(col1 VARCHAR(10), col2 CHAR(10));

INSERT INTO octet VALUES ('Company', 'Company')

SELECT OCTET_LENGTH (col1) returns 7

SELECT LENGTH (col1) returns 7

SELECT OCTET_LENGTH (col2) returns 10

SELECT LENGTH (col2) returns 7

Result type: TEXT, VARCHAR, or NVARCHAR

Returns c1 with trailing blanks appended. For example: if c1 is a varchar string that can hold fifty characters but only has two characters, pad(c1) appends 48 trailing blanks to c1 to form the result.

Note: This function does not support LONG VARCHAR or LONG NVARCHAR arguments.

Result type: SMALLINT

ANSI compliant version of LOCATE function. If a match exists, POSITION(c1 IN c2) is equal to LOCATE(c2, c1). If a match does not exist, POSITION returns 0, unlike LOCATE.

SELECT POSITION('p', 'Company') returns 4.

SELECT POSITION('z', 'Company') returns 0.

SELECT LOCATE('Company', 'p') returns 4 (same result as POSITION).

SELECT LOCATE('Company', 'z') returns 8 (one more than LENGTH(‘company’).

Result type: Any character or Unicode type except the long forms

Returns c1 (a character string) repeated n times. REPLICATE is a synonym for REPEAT.

Note: This function does not support LONG VARCHAR and LONG NVARCHAR arguments.

REPEAT ('-',10) returns '----------'

REPEAT ('str',3) returns 'strstrstr'

Result type: Any byte or character type

Returns character expression derived from expr1 in which all instances of expr2 have been replaced by expr3.

SELECT REPLACE('The priory was in the next town','priory','church')

returns: 'The church was in the next town'

Result type: Any byte, character, or Unicode type

Returns the rightmost len characters of c1. Trailing blanks are not removed first. If c1 is a fixed-length character string, the result is padded to the same length as c1. If c1 is a variable-length character string, no padding occurs. The result format is the same as c1. len must be a positive integer.

SELECT RIGHT('The priory was in the next town',9)

Result type: Any character type

Returns character expression of length n in which expr1 is appended by n-m blanks (where m is length(expr1)) or, if expr2 is coded, enough copies of expr2 to fill n-m positions at the end of the result string.

SELECT RPAD('Company',12, '-') returns 'Company-----'

SELECT RPAD('Company',12, '-x') returns 'Company-x-x-'

Result type: Any character type

Returns character expression with trailing blanks removed.

SELECT LENGTH(RTRIM('Company ')) returns 7

Result type: Any character or Unicode type

Shifts the string nshift places to the right if nshift > 0 and to the left if nshift < 0. If c1 is a fixed-length character string, the result is padded with blanks to the length of c1. If c1 is a variable-length character string, no padding occurs. The result format is the same as c1.

SELECT SHIFT('Company',4) returns ' Com'

SELECT SHIFT('Company',‑4) returns 'any '

Result type: smallint

Returns the declared size of c1 without removal of trailing blanks.

SELECT SIZE('Company') returns 7

Result type: Any character type

Returns a c1 four-character field that can be used to find similar sounding strings. For example, SMITH and SMYTHE produce the same SOUNDEX code. If there are less than three characters, the result is padded by trailing zero(s). If there are more than three characters, the result is achieved by dropping the rightmost digits.

This standard (Russell) soundex function is useful for finding like-sounding strings quickly. A list of similar sounding strings can be shown in a search list rather than just the next strings in the index.

SELECT SOUNDEX('Company') returns 'C515'

SELECT SOUNDEX ('Company2012') returns 'C515'

Result type: Any character type

Returns one or more six-character codes in a comma-separated varchar string (up to a maximum of 16 codes) that can be used to find similar sounding names.

This Daitch-Mokotoff soundex function returns one or more six-character codes in a comma-separated varchar string (up to a maximum of 16 codes) that can be used to find similar sounding names. For example: the codes 739460,734600 may be returned for the name Peterson.

Leading and embedded spaces are ignored in the input. The input strings are terminated by non-alphabetic characters other than period (.), apostrophe ('), or hyphen (-). The function ignores the first occurrence of any of these characters, but terminates the string on a subsequent occurrence. This allows the input of names or place names with standard punctuation marks. For example:

O'Brien is treated as 'OBRIEN'
St.Kilda is treated as 'STKILDA'
St. Hilda is treated as 'STHILDA'
'Smyth-Brown' is treated as 'SMYTHBROWN'

An empty name, or a name with characters unable to be encoded, returns a code of 000000 and no error is generated.

For more information, see SOUNDEX_DM vs. SOUNDEX (see SOUNDEX_DM vs. SOUNDEX).

Result type: TEXT or VARCHAR

Compresses white space. White space is defined as any sequence of blanks, null characters, newlines (line feeds), carriage returns, horizontal tabs and form feeds (vertical tabs). Trims white space from the beginning and end of the string, and replaces all other white space with single blanks.

This function is useful for comparisons. The value for c1 must be a string of variable‑length character string data type (not fixed-length character data type). The result is the same length as the argument.

SELECT SQUEEZE (' Company 2012 ') returns 'Company 2012'

Result type: VARBYTE, VARCHAR, NVARCHAR, and LONG variants

Returns part of c1 starting at the loc position and either extending to the end of the string or for the number of characters/code points in the len operand.

If len is specified and is less than 1, SUBSTR returns NULL.

The loc parameter determines the start of the substring to be extracted. If loc is less than 0 the position is counted backwards from the end of c1; if loc is greater than 0 the position is counted from the beginning. If loc is 0 the start position is the first character.

After the start of the substring is determined, len characters are extracted. If len is not supplied, the rest of c1 is implied.

If c1 is VARCHAR, CHAR, C, or TEXT, the result is a VARCHAR. If c1 is NCHAR or NVARCHAR, the result is an NVARCHAR. If c1 is a BYTE or VARBYTE, the result is a VARBYTE. If c1 is passed as a long data type, the result is of the same format.

SELECT SUBSTR('Company 2012',9,2) returns '20'

SELECT SUBSTR('Company 2012',9) returns '2012'

SELECT SUBSTR('Company 2012',-9,4) returns 'pany'

Result type: VARBYTE, VARCHAR, NVARCHAR, and LONG variants

Returns part of c1 starting at the loc position and either extending to the end of the string or for the number of characters/code points in the len operand. The result format is a varchar or nvarchar the size of c1.

SELECT SUBSTRING('Company 2012',9,2) returns '20'

SELECT SUBSTRING('Company 2012',9) returns '2012'

SELECT SUBSTRING('Company 2012',-9,4) returns an empty string.

Result type: TEXT or VARCHAR

Returns c1 without trailing blanks. The result has the same length as c1.

Result type: Any character string variable

Returns c2 with all occurrences of c1--which can be only one character--removed from the beginning, end, or both, as specified. BOTH is the default. In the absence of c1, the space is assumed.

SELECT 'INGRES' , '['||TRIM(' Company ' ) ||']'

SELECT 'BOTH' , '['||TRIM(BOTH ' ' FROM ' Company ' ) ||']'

SELECT 'LEADING' , '['||TRIM(LEADING ' ' FROM ' Company ' ) ||']'

SELECT 'TRAILING' , '['||TRIM(TRAILING ' ' FROM ' Company ' ) ||']'

Result type: Any character type

Converts all lower case characters in c1 to upper case.

Result type: INTEGER

Validates that the check digit in the string c1 is mathematically correct for the specified scheme. The function returns 1 for valid and 0 for invalid.

Restrictions on scheme names and strings are the same as for GENERATE_DIGIT (see GENERATE_DIGIT).

• You must not include separators, such as dashes or whitespace, in the string.

• The VALIDATE_DIGIT function does not enforce any internal formatting rules of the scheme. For example, an EAN is split into a GS1 Prefix, the Company Number, an Item reference, and the check digit. This function only calculates the check digit.

The scheme for GENERATE_DIGIT and VALIDATE_DIGIT functions can be one of the following:

The EAN is a bar-coding standard defined by the standards organization GS1. It is used worldwide for marking retail goods. It was developed as a superset of the original 12-digit Universal Product Code (UPC) system. It is also called a Japanese Article Number (JAN) in Japan. UPC, EAN, and JAN numbers are collectively called Global Trade Item Numbers (GTIN).

An EAN string is a 13-character string composed entirely of digits. The check digit is the rightmost character in the string.

The EAN_8 is an 8-character number derived from the longer EAN_13 and is intended for use on small packages where the longer number would be awkward.

EAN_8 can also be used internally by companies to identify restricted or "own-brand" products to be sold within their stores only. These are often referred to with the synonym RCN_8.

An EAN_8 string is an 8-character string composed entirely of digits. The check digit is the rightmost character in the string.

The standard ISBN is a 10-character alphanumeric identifier for printed material. The check digit is the rightmost character and can be either a digit or the alpha 'X'. The other characters in the string must be digits.

ISBN-13 is the 13-character version of the ISBN. The string is composed entirely of digits. The check digit is the rightmost character in the string.

The ISSN is an 8-character alphanumeric identifier for electronic or print periodicals. The check digit is the rightmost character and can be either a digit or the alpha 'X'. The other characters in the string must be digits.

Most credit cards and many government identification numbers are based on this simple algorithm created by Hans Peter Luhn. For example, the Canadian Social Security number is a nine digit number based on the Luhn algorithm.

The string can be of any permitted length, but must be composed entirely of digits. The check digit is the rightmost character in the string.

LUHN_A is an extension of the Luhn algorithm that allows for alphabetic characters by assigning numbers 10 to 'A', 11 to 'B', ...35 to 'Z'. It is case insensitive.

The Committee on Uniform Security Identification Procedures (CUSIP) provides a 9-character alphanumeric string based on this algorithm. Also the International Securities Identification Number (ISIN) is a 12-character version of this algorithm.

The string can be of any permitted length, but must be composed entirely of alphanumeric characters. The check digit is the rightmost character in the string.

The UPC is composed of 12 digits with the check digit being the rightmost character.

The UPC_E is composed of six digits and is intended for smaller packages where a full size UPC_A would be cumbersome. Unlike the UPC_A, the check digit is not appended to the number, but is used to determine the "odd/even" parity assigned to the numbers for when they are encoded as barcode lines.

Developed by Jacobus Verhoeff, this algorithm detects all transposition errors and other types of errors that are not detected by the Luhn algorithm.

Like the Luhn algorithm, it can handle strings of any permitted length. The string must be composed entirely of digits and the check digit will be the rightmost character.

A variation on the Verhoeff algorithm published in Numerical Recipes in C by Press, Teukolsky, Vetterling, and Flannery.

The advantages of the Daitch-Mokotoff soundex over the standard (Russell) soundex function are:

• The extra length of the codes returned avoids false positives on long names with the same base.

Name	SOUNDEX Code	SOUNDEX_DM Code
Nichols	N242	658400,648400
Nicholson	N242	658460,648460

In this example the soundex() function fails to differentiate between the names, whereas the SOUNDEX_DM() function returns several codes for each name, none of which match any elements in the other.

• The ability to return multiple codes allow for the different sounds certain character combinations make.

For example: Does the "ch" in Cherkassy sound like the "ch" in cheese or in Christmas?

• The ability to return multiple codes allow a greater chance to match names that the standard soundex will miss.

Name	SOUNDEX Code	SOUNDEX_DM Code
Schwartsenegger	S632	479465
Shwarzenegger	S625	479465,474659
Schwarzenegger	S625	479465,474659

In this example, the soundex() function fails to match the misspelling 'Schwartsenegger', whereas the SOUNDEX_DM() function generates a string with a matching element (479465) for the other two cases listed.

The disadvantage of the Daitch-Mokotoff soundex over the standard soundex function is the computational overhead required to process a multi-element string.

Some string functions do not accept LONG VARCHAR, LONG NVARCHAR, or LONG BYTE columns.

To apply any of these functions to a column with a long data type, first coerce the column to an acceptable data type. For example:

If a coercion function is applied to a LONG VARCHAR, LONG NVARCHAR, or LONG BYTE value that is longer than 32,000 characters or bytes, the result is truncated to 32,000 characters or bytes.

Note: For the UTF8 character set, the character data is multi-byte string and the actual number of bytes for the data could be more than the number of characters. If the output buffer for any string operation is not sufficiently large to hold the multi-byte string, the result will be truncated at a character boundary.