String Functions
String functions perform a variety of operations on character data.
String functions can be nested. For example:
LEFT(RIGHT(x.name, SIZE(x.name) - 1), 3)
returns the substring of x.name from character positions 2 through 4.
The || or + operator can also be used to concatenate strings:
x.lastname || ', ' || x.firstname
Note: Avoid using the + operator. It is an overloaded operator and can result in ambiguity when string columns and literals that contain only numeric values are mixed with numeric columns or literals.
SELECT '1' + '40' + '50' returns 14050
SELECT 1 + '40' + '50' returns 91
SELECT '1' + 40 + '50' returns 91
SELECT '1' + 40 || '50' returns 4150
ASCII
ASCII(v1)
Result type: Any character type
Returns the character equivalent of the value v1, which is an expression of any type.
CHARACTER_LENGTH
CHARACTER_LENGTH(c1)
Result type: INTEGER
Returns the number of characters in c1 without trimming blanks, as is done by the LENGTH() function.
Note: This function does not support NCHAR and NVARCHAR arguments.
CHAREXTRACT
CHAREXTRACT(c1,n)
Result type: VARCHAR or NCHAR
Returns the nth character or code point of c1. If n is larger than the length of the string, the result is a blank character.
Note: For Unicode (USC2) strings the value returned is the single USC2 character. For non-Unicode strings the values returned may be 1,2,3 or 4 bytes long depending on the character at offset n characters of c1.
SELECT CHAREXTRACT('company',4) returns 'p'
CHR
CHR(n)
Result type: CHAR
Converts integer into corresponding ASCII code. If n is greater than 255, the conversion is performed on n mod 256.
SELECT CHR(65) returns 'A'.
SELECT CHR(345) returns 'Y'.
CONCAT
CONCAT(c1,c2...)
Result type: Any character or Unicode type, BYTE
Concatenates two or more strings. CONCAT takes a minimum of one operand.
SELECT CONCAT('1', '2', '3', '4') returns '1234'.
In earlier releases, to achieve the same result, you must nest the CONCAT function: SELECT CONCAT(CONCAT(CONCAT('1', '2'), '3'), '4').
The result size is the sum of the sizes of the arguments. If the result is a c or char string, it is padded with blanks to achieve the proper length. To determine the data type results of concatenating strings, see the table regarding results of string concatenation.
Note: The concatenation of the BYTE data type cannot be used to create a table column implicitly in Vector tables; the result of such concatenation, however, can be inserted into a column with a character data type.
Wrong:
CREATE TABLE CONCAT2 AS SELECT CONCAT(BYTE('1'), 0x42, X'43', CAST('4' AS BYTE));
Correct:
CREATE TABLE test_concat (col1 VARCHAR(4);
INSERT INTO test_concat SELECT CONCAT(BYTE('1'), 0x42, X'43', CAST('4' AS BYTE));
INITCAP
INITCAP(c1)
Result type: Any character or Unicode type
Converts all initial characters in c1 to upper case.
SELECT INITCAP('This is the final version (VERSION:5.a;6) of Leonard''s will')
returns:'This Is The Final Version (Version:5.A;6) Of Leonard's Will'
JARO_WINKLER
JARO_WINKLER(c1,c2)
Result type: FLOAT4
Calculates the Jaro-Winkler similarity between two VARCHAR strings. This is returned as a float value between 0 and 1, where 0.0 means no similarities and 1.0 means the strings are identical.
SELECT JARO_WINKLER('same','same') returns 1.0
LEFT
LEFT(c1,len)
Result type: Any byte, character, or Unicode type
Returns the leftmost len characters of c1. If the result is a fixed-length c or char string, it is the same length as c1, padded with blanks. The result format is the same as c1. len will be converted to a positive integer.
If len is not an integer value it will be rounded down equivalent to floor(len). If len is negative the result will be an empty string.
SELECT LEFT ('Company',4) returns 'Comp'
LENGTH
LENGTH(c1)
Result type: SMALLINT
If c1 is a fixed-length char string, returns the length of c1 without trailing blanks. If c1 is a variable-length string, returns the number of characters actually in c1.
SELECT LENGTH ('Company') returns 7
LEVENSHTEIN
LEVENSHTEIN(c1,c1)
Result type: INTEGER4
Calculates the Levenshtein distance between two VARCHAR strings. The Levenshtein distance between two strings is the minimum number of changes that need to be made to convert the source string into the target string.
Alias: LEVENSHTEIN_DISTANCE
SELECT LEVENSHTEIN('foo','fou') returns 1
LOCATE
LOCATE(c1,c2)
Result type: SMALLINT
Returns the location of the first occurrence of c2 within c1, including trailing blanks from c2. The location is in the range 1 to size(c1). If c2 is not found, the function returns size(c1) + 1. The function size() is described below, in this table.
If c1 and c2 are different string data types, c2 is coerced into the c1 data type.
SELECT LOCATE ('Company', 'p') returns 4
LOWERCASE or LOWER
LOWERCASE(c1)
or
LOWER(c1)
Result type: Any character or Unicode type
Converts all upper case characters in c1 to lower case.
SELECT LOWER ('Company') returns 'company'
LPAD
LPAD(expr1, n [, expr2])
Result type: Any character type
Returns character expression of length n in which expr1 is prepended by n-m blanks (where m is length(expr1)) or, if expr2 is coded, enough copies of expr2 to fill n-m positions at the start of the result string.
SELECT LPAD ('Company',20, '-') returns '-------------Company'
LTRIM
LTRIM(expr)
Result type: Any character type
Returns character expression with leading blanks removed.
SELECT LTRIM (' Company') returns 'Company'
NOTRIM
NOTRIM(c1)
Result type: Any character string variable
Retains trailing blanks when placing a value in a varchar column. This function can be used only in an embedded SQL program.
OCTET_LENGTH or LENGTHB
OCTET_LENGTH(c1[,charset])
or
LENGTHB(c1[,charset])
Result type: INTEGER4
Returns the number of 8-bit octets (bytes) in c1 without trimming blanks, as is done by the LENGTH() function.
The result when multi-character datasets are used will be longer as each byte of a multi-bye character is counted.
The second parameter is any valid characterset supported by the database.
CREATE TABLE octet(col1 VARCHAR(10), col2 CHAR(10));
INSERT INTO octet VALUES ('Company', 'Company')
SELECT OCTET_LENGTH (col1) returns 7
SELECT LENGTHB (col1) returns 7
SELECT OCTET_LENGTH (col2) returns 10
SELECT LENGTHB (col2) returns 10
SELECT LENGTH (col2) from octet \g returns 7
POSITION
POSITION(c1 IN c2)
POSITION(c1 , c2)
Result type: SMALLINT
ANSI compliant version of LOCATE function. If a match exists, POSITION(c1 IN c2) is equal to LOCATE(c2, c1). If a match does not exist, POSITION returns 0, unlike LOCATE.
SELECT POSITION('p', 'Company') returns 4.
SELECT POSITION('z', 'Company') returns 0.
SELECT LOCATE('Company', 'p') returns 4 (same result as POSITION).
SELECT LOCATE('Company', 'z') returns 8 (one more than LENGTH(‘company’).
REPEAT
REPEAT(c1, n)
Result type: Any character type
Returns c1 (a character string) repeated n times.
SELECT REPEAT ('-',10) returns '----------'
SELECT REPEAT ('str',3) returns 'strstrstr'
REPLACE
REPLACE(expr1, expr2, expr3)
Result type: Any character type
Returns character expression derived from expr1 in which all instances of expr2 have been replaced by expr3.
SELECT REPLACE('The priory was in the next town','priory','church')
returns: 'The church was in the next town'
REVERSE
REVERSE(c1)
Result type: CHAR, VARCHAR, NCHAR, NVARCHAR
Returns the string c1 with the in-memory order of the characters reversed, based on code points. For CHAR and NCHAR, the trailing space is also reversed.
Reversal of code points is supported for all string types in both UTF8 and non-UTF8 installations.
In the following examples, each underline character (_) represents a space.
SELECT REVERSE('ecnalubma') returns ambulance.
Example for Unicode input:
SELECT REVERSE('Adán') returns nádA. The string 'Adán' is 4 code points:
• U+0041 Latin Capital Letter A
• U+0064 Latin Small Letter D
• U+00E1 Latin Small Letter A with Acute
• U+006E Latin Small Letter N
REVERSE(c1, 'egc')
Result type: CHAR, VARCHAR, NCHAR, NVARCHAR
Returns the string c1 with the in-memory order of the characters reversed, based on characters consisting of multiple code points (extended grapheme cluster). For CHAR and NCHAR, the trailing space is also reversed.
Reversal of extended grapheme clusters is supported for types CHAR and VARCHAR in case the data is interpreted as UTF8 (due to the installed character set or collation). For types NCHAR and NVARCHAR the reversal of extended grapheme clusters is supported independently of the installed character set.
Example for Unicode input:
SELECT REVERSE('Adán', 'egc') returns nádA. The string 'Adán' is 5 code points:
• U+0041 Latin Capital Letter A
• U+0064 Latin Small Letter D
• U+0061 Latin Small Letter A
• U+0301 Combining Acute Accent
• U+006E Latin Small Letter N
The a gets acute because U+0061 and U+0301 build a grapheme cluster.
To determine the number of code points in a string, use the LENGTH() function.
RIGHT
RIGHT(c1,len)
Result type: Any character or Unicode type
Returns the rightmost len characters of c1. Trailing blanks are not removed first. If c1 is a fixed-length character string, the result is padded to the same length as c1. If c1 is a variable-length character string, no padding occurs. The result format is the same as c1. len must be a positive integer.
SELECT RIGHT('The priory was in the next town',9)
returns 'next town'
RPAD
RPAD(expr1, n [, expr2])
Result type: Any character type
Returns character expression of length n in which expr1 is appended by n-m blanks (where m is length(expr1)) or, if expr2 is coded, enough copies of expr2 to fill n-m positions at the end of the result string.
SELECT RPAD('Company',12, '-') returns 'Company-----'
SELECT RPAD('Company',12, '-x') returns 'Company-x-x-'
RTRIM
RTRIM(expr)
Result type: Any character type
Returns character expression with trailing blanks removed.
SELECT LENGTH(RTRIM('Company ')) returns 7
SHIFT
SHIFT(c1,nshift)
Result type: Any character or Unicode type
Shifts the string nshift places to the right if nshift > 0 and to the left if nshift < 0. If c1 is a fixed-length character string, the result is padded with blanks to the length of c1. If c1 is a variable-length character string, no padding occurs. The result format is the same as c1.
SELECT SHIFT('Company',4) returns ' Com'
SELECT SHIFT('Company',‑4) returns 'any '
SIZE
SIZE(c1)
Result type: smallint
Returns the declared size of c1 without removal of trailing blanks.
SELECT SIZE('Company') returns 7
SOUNDEX
SOUNDEX(c1)
Result type: Any character type
Returns a c1 four-character field that can be used to find similar sounding strings. For example, SMITH and SMYTHE produce the same SOUNDEX code. If there are less than three characters, the result is padded by trailing zero(s). If there are more than three characters, the result is achieved by dropping the rightmost digits.
This standard (Russell) soundex function is useful for finding like-sounding strings quickly. A list of similar sounding strings can be shown in a search list rather than only the next strings in the index.
SELECT SOUNDEX('Company') returns 'C515'
SELECT SOUNDEX ('Company2012') returns 'C515'
SQUEEZE
SQUEEZE(c1)
Result type: VARCHAR
Compresses white space. White space is defined as any sequence of blanks, null characters, newlines (line feeds), carriage returns, horizontal tabs and form feeds (vertical tabs). Trims white space from the beginning and end of the string, and replaces all other white space with single blanks.
This function is useful for comparisons. The value for c1 must be a string of variable‑length character string data type (not fixed-length character data type). The result is the same length as the argument.
SELECT SQUEEZE (' Company 2012 ') returns 'Company 2012'
SUBSTR
SUBSTR(c1, loc [, len])
Result type: VARCHAR, NVARCHAR
Returns part of c1 starting at the loc position and either extending to the end of the string or for the number of characters/code points in the len operand.
If len is specified and is less than 1, SUBSTR returns NULL.
The loc parameter determines the start of the substring to be extracted. If loc is less than 0 the position is counted backwards from the end of c1; if loc is greater than 0 the position is counted from the beginning. If loc is 0 the start position is the first character.
After the start of the substring is determined, len characters are extracted. If len is not supplied, the rest of c1 is implied.
SELECT SUBSTR('Company 2012',9,2) returns '20'
SELECT SUBSTR('Company 2012',9) returns '2012'
SELECT SUBSTR('Company 2012',-9,4) returns 'pany'
SUBSTRING
Note: The SUBSTRING function applies to Heap tables only.
SUBSTRING(s BEGINNING pattern [ESCAPE esc_character] [modifier])
SUBSTRING(s ENDING pattern [ESCAPE esc_character] [modifier])
SUBSTRING(s LIKE pattern ESCAPE esc_character [modifier])
SUBSTRING(s SIMILAR TO pattern ESCAPE esc_character [modifier])
Result type: VARCHAR and NVARCHAR
Returns part of string s that matches the pattern specified:
BEGINNING
Returns the substring from s that starts with the first occurrence of the pattern.
ENDING
Returns the substring from s that ends with the last occurrence of the pattern.
LIKE and SIMILAR TO
Returns the substring from s that best matches the pattern.
These use a pattern syntax based on LIKE and SIMILAR TO predicates. In addition to the respective pattern syntax, both use a pair of quotation marks (") to divide the pattern into three sub-patterns.
An attempt is made to match strings s with the complete pattern and to return the portion of string s that corresponds to the middle sub-pattern. If part of string s can be associated with either the middle or an outer sub-pattern then the middle wins. If the second quotation mark is missing, it will default to the end of pattern, and thus the third pattern would be empty. If no quotation marks are present NULL is returned regardless of the matching.
If the alternation operator (|) is not used, the arrangement with the longest middle pattern match is returned. If an alternation operator is used, these are sequentially used, left to right so the first alternative that generates a total pattern match is taken, and then longest middle match returned.
ESCAPE esc_character
Specifies an escape character, which suppresses any special meaning for the character following it, allowing the character to be entered literally. For details, see
LIKE Predicate.
modifier
Controls the matching using the modifier WITHOUT DIACRITICAL or WITHOUT CASE or their positive equivalents WITH DIACRITICAL and WITH CASE, as described in LIKE Predicate.
Default: If no modifier is specified, WITH DIACRITICAL and WITH CASE are implied.
If no match is found, the NULL value is returned.
The result data type will be a VARCHAR of NVARCHAR the size of s.
SELECT SUBSTRING('Company 2016' LIKE 'Company \"%\"' ESCAPE '\') returns '2016'
The whole pattern has to match but think of the patterns as three sub-patterns: a prefix pattern, the contained pattern, and the trailing pattern. The prefix and trailing are matched but the substring result is of the text that corresponds to the pattern between the pattern markers.
SELECT SUBSTRING('Company 2016' SIMILAR TO '% \"[0-9]{4}\"'ESCAPE'\') returns '2016'
The part of the full pattern that is between the double quote markers—‘[0-9]{4}’—delimit the result.
SELECT SUBSTRING('217 Bath Road, Slough, Berkshire, SL1 4AA, UK'
SIMILAR TO '%\"[A-Z]{1,2}[0-9]{1,2} [0-9][A-Z]{2}\"%' ESCAPE '\') returns 'SL1 4AA'
SELECT SUBSTRING('Company 2016' BEGINNING 'test') returns NULL.
SELECT SUBSTRING('Company 2016' BEGINNING 'pan\|20'ESCAPE '\') returns 'pany 2016'.
SELECT SUBSTRING('Company 2016' BEGINNING '20\|pan'ESCAPE '\') returns '2016'.
SELECT SUBSTRING('Company 2016' BEGINNING '20\|pan'ESCAPE '\') returns '2016'.
The alternation operator rules are applied left to right (the same as is used for LIKE and SIMILAR TO).
SELECT SUBSTRING('?????????eE?' SIMILAR TO '%\"E+\"%' escape '\'
WITH CASE WITH DIACRITICAL) returns 'E'
SELECT SUBSTRING('?????????eE?' SIMILAR TO '%\"E+\"%' escape '\'
WITH CASE WITHOUT DIACRITICAL) returns '???'
SELECT SUBSTRING('?????????eE?' SIMILAR TO '%\"E+\"%' escape '\'
WITHOUT CASE WITH DIACRITICAL) returns 'eE'
SELECT SUBSTRING('?????????eE?' SIMILAR TO '%\"E+\"%' escape '\'
WITHOUT CASE WITHOUT DIACRITICAL) returns '?????????eE?'
SUBSTRING_INDEX
SUBSTRING_INDEX(str,delim,count)
Result type: VARCHAR
Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. This function performs a case-sensitive match when searching for delim.
CREATE TABLE test(str VARCHAR(20), cnt INT);
INSERT INTO test VALUES('www.actian.com', 2);
INSERT INTO test VALUES('www.actian.com', -2);
SELECT SUBSTRING_INDEX(str, '.', 2) FROM test;
+--------------------+
|col1 |
+--------------------+
|www.actian |
|www.actian |
+--------------------+
SELECT SUBSTRING_INDEX(str, '.', cnt) FROM test;
+--------------------+
|col1 |
+--------------------+
|www.actian |
|actian.com |
+--------------------+
Note: This function works only for data stored in tables created with the X100 or X100_ROW storage structure.
TRIM
TRIM(c1)
Result type: VARCHAR
Returns c1 without trailing blanks. The result has the same length as c1.
ANSI Compliant Version of TRIM:
TRIM([ [BOTH | LEADING | TRAILING] [c1] FROM] c2 )
Result type: Any character string variable
Returns c2 with all occurrences of c1—which can be only one character—removed from the beginning, end, or both, as specified. BOTH is the default. In the absence of c1, the space is assumed.
SELECT 'DEFAULT' , '['||TRIM(' Company ' ) ||']'
returns [ Company]
SELECT 'BOTH' , '['||TRIM(BOTH ' ' FROM ' Company ' ) ||']'
returns [Company]
SELECT 'LEADING' , '['||TRIM(LEADING ' ' FROM ' Company ' ) ||']'
returns [Company ]
SELECT 'TRAILING' , '['||TRIM(TRAILING ' ' FROM ' Company ' ) ||']'
returns [ Company]
UPPERCASE or UPPER
UPPERCASE(c1)
or
UPPER(c1)
Result type: Any character type
Converts all lower case characters in c1 to upper case.
String Functions and the UTF8 Character Set
For the UTF8 character set, the character data is multi-byte string and the actual number of bytes for the data could be more than the number of characters. If the output buffer for any string operation is not sufficiently large to hold the multi-byte string, the result will be truncated at a character boundary.