4. Understanding the Elements of SQL Statements : SQL Functions : Scalar Functions : String Functions
 
Share this page                  
String Functions
String functions perform a variety of operations on character data.
String functions can be nested. For example:
LEFT(RIGHT(x.name, SIZE(x.name) - 1), 3)
returns the substring of x.name from character positions 2 through 4.
The || or + operator can also be used to concatenate strings:
x.lastname || ', ' || x.firstname
Note:  Using + should be avoided. It is an overloaded operator and can result in ambiguity when string columns and literals that contain only numeric values are mixed with numeric columns or literals.
SELECT '1' + '40' + '50' returns 14050
SELECT 1 + '40' + '50' returns 91
SELECT '1' + 40 + '50' returns 91
SELECT '1' + 40 || '50' returns 4150
ASCII
ASCII(v1)
Result type: Any character type
Returns the character equivalent of the value v1, which is an expression of any type.
BYTEEXTRACT
BYTEEXTRACT(c1,n)
Result type: BYTE
Returns the nth byte of c1. If n is larger than the length of the string, the result is a byte of ASCII 0.
Note:  This function does not support LONG VARCHAR, LONG NVARCHAR, or LONG BYTE arguments.
CHARACTER_LENGTH
CHARACTER_LENGTH(c1)
Result type: INTEGER
Returns the number of characters in c1 without trimming blanks, as is done by the LENGTH() function.
Note:  This function does not support NCHAR and NVARCHAR arguments.
CHAREXTRACT
CHAREXTRACT(c1,n)
Result type: VARCHAR or NCHAR
Returns the nth character or code point of c1. If n is larger than the length of the string, the result is a blank character.
Note:  For Unicode (USC2) strings the value returned is the single USC2 character. For non-Unicode strings the values returned may be 1,2,3 or 4 bytes long depending on the character at offset n characters of c1.
SELECT CHAREXTRACT('company',4) returns 'p'
CHR
CHR(n)
Result type: CHAR
Converts integer into corresponding ASCII code. If n is greater than 255, the conversion is performed on n mod 256.
SELECT CHR(65) returns 'A'.
SELECT CHR(345) returns 'Y'.
COLLATION_WEIGHT
COLLATION_WEIGHT(c1 [,n1])
Result type: VARBYTE
Returns the collation weight of any char, c, varchar, text, nchar, or nvarchar value c1. n1 is an optional collation ID when the collation weight is desired relative to a specific collation.
CONCAT
CONCAT(c1,c2...)
Result type: Any character or Unicode type, BYTE
Concatenates two or more strings.
SELECT CONCAT('1', '2', '3', '4') returns '1234'.
In earlier releases, to achieve the same result, you must nest the CONCAT function: SELECT CONCAT(CONCAT(CONCAT('1', '2'), '3'), '4').
The result size is the sum of the sizes of the arguments. If the result is a c or char string, it is padded with blanks to achieve the proper length. To determine the data type results of concatenating strings, see the table regarding results of string concatenation.
Note:  The concatenation of the BYTE data type cannot be used to create a table column implicitly in Ingres tables; the result of such concatenation, however, can be inserted into a column with a character data type.
Wrong:
CREATE TABLE CONCAT2 AS SELECT CONCAT(BYTE('1'), 0x42, X'43', CAST('4' AS BYTE));
Correct:
CREATE TABLE test_concat (col1 VARCHAR(4);
INSERT INTO test_concat SELECT CONCAT(BYTE('1'), 0x42, X'43', CAST('4' AS BYTE));
GENERATE_DIGIT
GENERATE_DIGIT('scheme', 'c1')
Result type: VARCHAR(2)
Generates the check digit for the string c1 using the specified encoding scheme.
Supported schemes are described under VALIDATE_DIGIT (see VALIDATE_DIGIT).
The check digit is computed from the other characters in the string. Although the majority of check digits are a single character, some encoding schemes may return a two character check digit. Typically the check digit is appended to the string for validation, but this is not the case in all encoding schemes.
Note:  This function does not append the check digit to the string.
The check digit can be used to help determine if a string has been entered correctly. A check digit may be able to detect several types of errors, including:
Mistyped characters ('0123' not '0124')
Omissions or additions of characters
Adjacent transposition errors ('01234' not '01243')
Twin errors ('11' becomes '22')
Jump transpositions (123 becomes 321)
Jump twin errors (121 becomes 323)
Phonetic errors--the mixing of similar sounding numbers (sixty '60' not sixteen '16')
A check digit improves the chance of detecting errors in keying, but is not an infallible error detector. Furthermore, a check digit does not provide error correction capability.
There are many check digit schemes, each with its own purpose, capabilities, and complexity. For example, the Luhn algorithm is widely used in credit card numbers and is programmatically simple to implement. The Verhoeff algorithm, in contrast, is better at detecting certain error types such as jump transpositions, but is more complicated.
The scheme name is case insensitive. When specifying the LUHN or VERHOEFF scheme, the function can take as input a character string of any length. The other schemes are fixed length, so the input string must be of the correct length.
Note:  This function does not support NCHAR and NVARCHAR strings as arguments.
II_IPADDR
II_IPADDR(n)
Result type: VARBYTE
Converts an IPv4 (dotted-decimal) or IPv6 (coloned-hex) address to a varbyte data type of length 4 or 16 respectively. Each number between the periods of an IPv4 address is converted into a single byte in the hex result. For IPv6, each hex number between the colons is converted to two bytes in the hex result. Double-colons and IPv4-mapped IPv6 addresses are supported as well.
Views and database procedures created prior to Ingres 9.2 will continue to exhibit the old behavior until they are recreated. Previously, only IPv4 addresses were supported and the returned value was type=byte with length=4.
For example:
255.255.0.0 produces ‘ffff0000’;
10.20.30.40 produces ‘0a141e28’
2.1 produces “02000001”
fe80::208:74ff:fef0:42b3 produces “FE80000000000000020874FFFEF042B3”
::1 produces “00000000000000000000000000000001”
::FFFF:141.202.36.10 produces “00000000000000000000FFFF8DCA240A”
INITCAP
INITCAP(c1)
Result type: Any character or Unicode type
Converts all initial characters in c1 to upper case.
SELECT INITCAP('This is the final version (VERSION:5.a;6) of Leonard''s will')
returns:'This Is The Final Version (Version:5.A;6) Of Leonard's Will'
LEFT
LEFT(c1,len)
Result type: Any byte, character, or Unicode type
Returns the leftmost len characters of c1. If the result is a fixed-length c or char string, it is the same length as c1, padded with blanks. The result format is the same as c1. len will be converted to a positive integer.
If len is not an integer value it will be rounded down equivalent to floor(len). If len is negative the result will be an empty string.
SELECT LEFT ('Company',4) returns 'Comp'
LENGTH
LENGTH(c1)
Result type: SMALLINT (for LONG VARCHAR, returns INTEGER4)
If c1 is a fixed-length c or char string, returns the length of c1 without trailing blanks. If c1 is a variable-length string, returns the number of characters actually in c1.
LOCATE
LOCATE(c1,c2)
Result type: SMALLINT
Returns the location of the first occurrence of c2 within c1, including trailing blanks from c2. The location is in the range 1 to size(c1). If c2 is not found, the function returns size(c1) + 1. The function size() is described below, in this table.
If c1 and c2 are different string data types, c2 is coerced into the c1 data type.
SELECT LOCATE ('Company', 'p') returns 4
LOWERCASE or LOWER
LOWERCASE(c1)
or
LOWER(c1)
Result type: Any character or Unicode type
Converts all upper case characters in c1 to lower case.
SELECT LOWER ('Company') returns 'company'
LPAD
LPAD(expr1, n [, expr2])
Result type: Any character type
Returns character expression of length n in which expr1 is prepended by n-m blanks (where m is length(expr1)) or, if expr2 is coded, enough copies of expr2 to fill n-m positions at the start of the result string.
SELECT LPAD ('Company',20, '-') returns '-------------Company'
LTRIM
LTRIM(expr)
Result type: Any character type
Returns character expression with leading blanks removed.
SELECT LTRIM ('  Company') returns 'Company'
MD5SUM
MD5SUM(c1)
Result type: CHAR(32)
Returns the calculated MD5 message digest (also known as md5sum) of c1. The computed sum generated is a lowercase hexadecimal string.
SELECT MD5SUM('a') returns '0cc175b9c0f1b6a831c399e269772661'.
NOTRIM
NOTRIM(c1)
Result type: Any character string variable
Retains trailing blanks when placing a value in a varchar column. This function can be used only in an embedded SQL program. For more information, see the Embedded SQL Companion Guide.
OCTET_LENGTH
OCTET_LENGTH(c1)
Result type: INTEGER
Returns the number of 8-bit octets (bytes) in c1 without trimming blanks, as is done by the LENGTH() function.
CREATE TABLE octet(col1 VARCHAR(10), col2 CHAR(10));
INSERT INTO octet VALUES ('Company', 'Company')
SELECT OCTET_LENGTH (col1) returns 7
SELECT LENGTH (col1) returns 7
SELECT OCTET_LENGTH (col2) returns 10
SELECT LENGTH (col2) returns 7
PAD
PAD(c1)
Result type: TEXT, VARCHAR, or NVARCHAR
Returns c1 with trailing blanks appended. For example: if c1 is a varchar string that can hold fifty characters but only has two characters, pad(c1) appends 48 trailing blanks to c1 to form the result.
Note:  This function does not support LONG VARCHAR or LONG NVARCHAR arguments.
POSITION
POSITION(c1 IN c2)
POSITION(c1 , c2)
Result type: SMALLINT
ANSI compliant version of LOCATE function. If a match exists, POSITION(c1 IN c2) is equal to LOCATE(c2, c1). If a match does not exist, POSITION returns 0, unlike LOCATE.
SELECT POSITION('p', 'Company') returns 4.
SELECT POSITION('z', 'Company') returns 0.
SELECT LOCATE('Company', 'p') returns 4 (same result as POSITION).
SELECT LOCATE('Company', 'z') returns 8 (one more than LENGTH(‘company’).
REPEAT
REPEAT(c1, n)
Result type: Any character or Unicode type except the long forms
Returns c1 (a character string) repeated n times. REPLICATE is a synonym for REPEAT.
Note:  This function does not support LONG VARCHAR and LONG NVARCHAR arguments.
REPEAT ('-',10) returns '----------'
REPEAT ('str',3) returns 'strstrstr'
REPLACE
REPLACE(expr1, expr2, expr3)
Result type: Any byte or character type
Returns character expression derived from expr1 in which all instances of expr2 have been replaced by expr3.
SELECT REPLACE('The priory was in the next town','priory','church')
returns: 'The church was in the next town'
RIGHT
RIGHT(c1,len)
Result type: Any byte, character, or Unicode type
Returns the rightmost len characters of c1. Trailing blanks are not removed first. If c1 is a fixed-length character string, the result is padded to the same length as c1. If c1 is a variable-length character string, no padding occurs. The result format is the same as c1. len must be a positive integer.
SELECT RIGHT('The priory was in the next town',9)
returns 'next town'
RPAD
RPAD(expr1, n [, expr2])
Result type: Any character type
Returns character expression of length n in which expr1 is appended by n-m blanks (where m is length(expr1)) or, if expr2 is coded, enough copies of expr2 to fill n-m positions at the end of the result string.
SELECT RPAD('Company',12, '-') returns 'Company-----'
SELECT RPAD('Company',12, '-x') returns 'Company-x-x-'
RTRIM
RTRIM(expr)
Result type: Any character type
Returns character expression with trailing blanks removed.
SELECT LENGTH(RTRIM('Company     ')) returns 7
SHIFT
SHIFT(c1,nshift)
Result type: Any character or Unicode type
Shifts the string nshift places to the right if nshift > 0 and to the left if nshift < 0. If c1 is a fixed-length character string, the result is padded with blanks to the length of c1. If c1 is a variable-length character string, no padding occurs. The result format is the same as c1.
SELECT SHIFT('Company',4) returns '   Com'
SELECT SHIFT('Company',‑4) returns 'any    '
SIZE
SIZE(c1)
Result type: smallint
Returns the declared size of c1 without removal of trailing blanks.
SELECT SIZE('Company') returns 7
SOUNDEX
SOUNDEX(c1)
Result type: Any character type
Returns a c1 four-character field that can be used to find similar sounding strings. For example, SMITH and SMYTHE produce the same SOUNDEX code. If there are less than three characters, the result is padded by trailing zero(s). If there are more than three characters, the result is achieved by dropping the rightmost digits.
This standard (Russell) soundex function is useful for finding like-sounding strings quickly. A list of similar sounding strings can be shown in a search list rather than just the next strings in the index.
Note:  This function does not support LONG VARCHAR or any Unicode arguments.
SELECT SOUNDEX('Company') returns 'C515'
SELECT SOUNDEX ('Company2012') returns 'C515'
SOUNDEX_DM
SOUNDEX_DM(c1)
Result type: Any character type
Returns one or more six-character codes in a comma-separated varchar string (up to a maximum of 16 codes) that can be used to find similar sounding names.
This Daitch-Mokotoff soundex function returns one or more six-character codes in a comma-separated varchar string (up to a maximum of 16 codes) that can be used to find similar sounding names. For example: the codes 739460,734600 may be returned for the name Peterson.
Leading and embedded spaces are ignored in the input. The input strings are terminated by non-alphabetic characters other than period (.), apostrophe ('), or hyphen (-). The function ignores the first occurrence of any of these characters, but terminates the string on a subsequent occurrence. This allows the input of names or place names with standard punctuation marks. For example:
O'Brien is treated as 'OBRIEN'
St.Kilda is treated as 'STKILDA'
St. Hilda is treated as 'STHILDA'
'Smyth-Brown' is treated as 'SMYTHBROWN'
An empty name, or a name with characters unable to be encoded, returns a code of 000000 and no error is generated.
Note:  This function does not support LONG VARCHAR or any Unicode arguments.
For more information, see SOUNDEX_DM vs. SOUNDEX (see SOUNDEX_DM vs. SOUNDEX).
SQUEEZE
SQUEEZE(c1)
Result type: TEXT or VARCHAR
Compresses white space. White space is defined as any sequence of blanks, null characters, newlines (line feeds), carriage returns, horizontal tabs and form feeds (vertical tabs). Trims white space from the beginning and end of the string, and replaces all other white space with single blanks.
This function is useful for comparisons. The value for c1 must be a string of variable‑length character string data type (not fixed-length character data type). The result is the same length as the argument.
SELECT SQUEEZE ('  Company 2012  ') returns 'Company 2012'
SUBSTR
SUBSTR(c1, loc [, len])
Result type: VARBYTE, VARCHAR, NVARCHAR, and LONG variants
Returns part of c1 starting at the loc position and either extending to the end of the string or for the number of characters/code points in the len operand.
If len is specified and is less than 1, SUBSTR returns NULL.
The loc parameter determines the start of the substring to be extracted. If loc is less than 0 the position is counted backwards from the end of c1; if loc is greater than 0 the position is counted from the beginning. If loc is 0 the start position is the first character.
After the start of the substring is determined, len characters are extracted. If len is not supplied, the rest of c1 is implied.
If c1 is VARCHAR, CHAR, C, or TEXT, the result is a VARCHAR. If c1 is NCHAR or NVARCHAR, the result is an NVARCHAR. If c1 is a BYTE or VARBYTE, the result is a VARBYTE. If c1 is passed as a long data type, the result is of the same format.
SELECT SUBSTR('Company 2012',9,2) returns '20'
SELECT SUBSTR('Company 2012',9) returns '2012'
SELECT SUBSTR('Company 2012',-9,4) returns 'pany'
SUBSTRING
SUBSTRING(c1 FROM loc [FOR len])
Result type: VARBYTE, VARCHAR, NVARCHAR, and LONG variants
Returns part of c1 starting at the loc position and either extending to the end of the string or for the number of characters/code points in the len operand. The result format is a varchar or nvarchar the size of c1.
SELECT SUBSTRING('Company 2012',9,2) returns '20'
SELECT SUBSTRING('Company 2012',9) returns '2012'
SELECT SUBSTRING('Company 2012',-9,4) returns an empty string.
TRIM
TRIM(c1)
Result type: TEXT or VARCHAR
Returns c1 without trailing blanks. The result has the same length as c1.
ANSI Compliant Version of TRIM:
TRIM([ [BOTH | LEADING | TRAILING] [c1] FROM] c2 )
Result type: Any character string variable
Returns c2 with all occurrences of c1--which can be only one character--removed from the beginning, end, or both, as specified. BOTH is the default. In the absence of c1, the space is assumed.
SELECT 'INGRES' , '['||TRIM('  Company  ' ) ||']'
returns [  Company]
SELECT 'BOTH' , '['||TRIM(BOTH ' ' FROM '  Company  ' ) ||']'
returns [Company]
SELECT 'LEADING' , '['||TRIM(LEADING ' ' FROM '  Company  ' ) ||']'
returns [Company  ]
SELECT 'TRAILING' , '['||TRIM(TRAILING ' ' FROM '  Company  ' ) ||']'
returns [ Company]
UPPERCASE or UPPER
UPPERCASE(c1)
or
UPPER(c1)
Result type: Any character type
Converts all lower case characters in c1 to upper case.
VALIDATE_DIGIT
VALIDATE_DIGIT('scheme', c1)
Result type: INTEGER
Validates that the check digit in the string c1 is mathematically correct for the specified scheme. The function returns 1 for valid and 0 for invalid.
Restrictions on scheme names and strings are the same as for GENERATE_DIGIT (see GENERATE_DIGIT).
Notes:
The scheme name is case insensitive (can be in mixed case).
You must not include separators, such as dashes or whitespace, in the string.
The VALIDATE_DIGIT function does not enforce any internal formatting rules of the scheme. For example, an EAN is split into a GS1 Prefix, the Company Number, an Item reference, and the check digit. This function only calculates the check digit.
The scheme for GENERATE_DIGIT and VALIDATE_DIGIT functions can be one of the following:
EAN
European Article Number
Synonyms: EAN_13, GTIN, GTIN_13, JAN
The EAN is a bar-coding standard defined by the standards organization GS1. It is used worldwide for marking retail goods. It was developed as a superset of the original 12-digit Universal Product Code (UPC) system. It is also called a Japanese Article Number (JAN) in Japan. UPC, EAN, and JAN numbers are collectively called Global Trade Item Numbers (GTIN).
An EAN string is a 13-character string composed entirely of digits. The check digit is the rightmost character in the string.
EAN_8
European Article Number
Synonyms: GTIN_8, RCN_8
The EAN_8 is an 8-character number derived from the longer EAN_13 and is intended for use on small packages where the longer number would be awkward.
EAN_8 can also be used internally by companies to identify restricted or "own-brand" products to be sold within their stores only. These are often referred to with the synonym RCN_8.
An EAN_8 string is an 8-character string composed entirely of digits. The check digit is the rightmost character in the string.
ISBN
International Standard Book Number
The standard ISBN is a 10-character alphanumeric identifier for printed material. The check digit is the rightmost character and can be either a digit or the alpha 'X'. The other characters in the string must be digits.
ISBN_13
International Standard Book Number
ISBN-13 is the 13-character version of the ISBN. The string is composed entirely of digits. The check digit is the rightmost character in the string.
ISSN
International Standard Serial Number
The ISSN is an 8-character alphanumeric identifier for electronic or print periodicals. The check digit is the rightmost character and can be either a digit or the alpha 'X'. The other characters in the string must be digits.
LUHN
The LUHN algorithm
Most credit cards and many government identification numbers are based on this simple algorithm created by Hans Peter Luhn. For example, the Canadian Social Security number is a nine digit number based on the Luhn algorithm.
The string can be of any permitted length, but must be composed entirely of digits. The check digit is the rightmost character in the string.
LUHN_A
The LUHN algorithm (alphanumeric)
LUHN_A is an extension of the Luhn algorithm that allows for alphabetic characters by assigning numbers 10 to 'A', 11 to 'B', ...35 to 'Z'. It is case insensitive.
The Committee on Uniform Security Identification Procedures (CUSIP) provides a 9-character alphanumeric string based on this algorithm. Also the International Securities Identification Number (ISIN) is a 12-character version of this algorithm.
The string can be of any permitted length, but must be composed entirely of alphanumeric characters. The check digit is the rightmost character in the string.
UPC
Universal Product Code
Synonyms: UPC_A, EAN_12
Used to mark retail goods as per the EAN (see above).
The UPC is composed of 12 digits with the check digit being the rightmost character.
UPC_E
Universal Product Code
The UPC_E is composed of six digits and is intended for smaller packages where a full size UPC_A would be cumbersome. Unlike the UPC_A, the check digit is not appended to the number, but is used to determine the "odd/even" parity assigned to the numbers for when they are encoded as barcode lines.
VERHOEFF
The Verhoeff algorithm
Developed by Jacobus Verhoeff, this algorithm detects all transposition errors and other types of errors that are not detected by the Luhn algorithm.
Like the Luhn algorithm, it can handle strings of any permitted length. The string must be composed entirely of digits and the check digit will be the rightmost character.
VERHOEFFNR
Verhoeff algorithm as described in Numerical Recipes in C.
A variation on the Verhoeff algorithm published in Numerical Recipes in C by Press, Teukolsky, Vetterling, and Flannery.
SOUNDEX_DM vs. SOUNDEX
The advantages of the Daitch-Mokotoff soundex over the standard (Russell) soundex function are:
The extra length of the codes returned avoids false positives on long names with the same base.
For example:
Name
SOUNDEX Code
SOUNDEX_DM Code
Nichols
N242
658400,648400
Nicholson
N242
658460,648460
In this example the soundex() function fails to differentiate between the names, whereas the SOUNDEX_DM() function returns several codes for each name, none of which match any elements in the other.
The ability to return multiple codes allow for the different sounds certain character combinations make.
For example: Does the "ch" in Cherkassy sound like the "ch" in cheese or in Christmas?
The ability to return multiple codes allow a greater chance to match names that the standard soundex will miss.
For example:
Name
SOUNDEX Code
SOUNDEX_DM Code
Schwartsenegger
S632
479465
Shwarzenegger
S625
479465,474659
Schwarzenegger
S625
479465,474659
In this example, the soundex() function fails to match the misspelling 'Schwartsenegger', whereas the SOUNDEX_DM() function generates a string with a matching element (479465) for the other two cases listed.
The disadvantage of the Daitch-Mokotoff soundex over the standard soundex function is the computational overhead required to process a multi-element string.
String Functions That Do Not Accept Long Data Types
Some string functions do not accept LONG VARCHAR, LONG NVARCHAR, or LONG BYTE columns.
Such functions include:
BYTEEXTRACT
CHAREXTRACT
CONCAT
INITCAP
LEFT
LOCATE
LPAD
NOTRIM
REPEAT
REPLACE
RIGHT
RPAD
SHIFT
SQUEEZE
TRIM
To apply any of these functions to a column with a long data type, first coerce the column to an acceptable data type. For example:
SQUEEZE(VARCHAR(long_varchar_column))
If a coercion function is applied to a LONG VARCHAR, LONG NVARCHAR, or LONG BYTE value that is longer than 32,000 characters or bytes, the result is truncated to 32,000 characters or bytes.
String Functions and the UTF8 Character Set
Note:  For the UTF8 character set, the character data is multi-byte string and the actual number of bytes for the data could be more than the number of characters. If the output buffer for any string operation is not sufficiently large to hold the multi-byte string, the result will be truncated at a character boundary.