Pattern-matching Predicates
SIMILAR TO pattern-matching predicates (the LIKE family of predicates) are used to search for a specified pattern in text, an expression, or column. These predicates include:
• LIKE
• BEGINNING, ENDING, CONTAINING
• SIMILAR TO
• LIKE_REGEX
The specific predicate to use depends on how complex the pattern needs to be. All predicates in the LIKE family have the same SQL syntax, operate on the same data types, and can control the case-sensitivity of the matching.
The LIKE family of predicates performs pattern matching for the character data types (char, varchar, long varchar, and text) and Unicode data types (nchar, nvarchar, and long nvarchar).
The LIKE family of predicates has the following syntax:
expression [NOT] [LIKE|BEGINNING|CONTAINING|ENDING|SIMILAR TO] pattern
[WITH CASE | WITHOUT CASE]
[ESCAPE escape_character]
where
expression
Is a column name or a string expression.
pattern
Specifies the pattern to be matched. The pattern is typically a string literal but can be an arbitrary string expression.
The patterns supported depends upon the specific predicate used.
WITH CASE | WITHOUT CASE
Indicates whether to match the case of the pattern. This option can be used before, after, or instead of the ESCAPE clause.
Default: If not specified, the collation type of the expression is used, typically WITH CASE.
ESCAPE escape_character
Specifies the character to use to escape another character or to enable a character's special meaning in a pattern.
LIKE Predicate
Use LIKE in a WHERE clause to search for a specified pattern in a column.
LIKE performs pattern matching for the character and Unicode data types.
The LIKE predicate has the following syntax:
expression [NOT] LIKE pattern [WITH CASE | WITHOUT CASE]
[WITH DIACRITICAL | WITHOUT DIACRITICAL]
[ESCAPE escape_character]
where
expression
Is a column name or an expression containing string functions.
pattern
Specifies the pattern to be matched. The pattern is typically a string literal but can be an arbitrary string expression.
The pattern-matching (wild card) characters are as follows:
% (percent sign)
Denotes 0 or more characters
_ (underscore)
Denotes a single character
\|
Denotes the alternation operator
WITH CASE | WITHOUT CASE
Indicates whether to match the case of the pattern. This option can be used before, after, or instead of the ESCAPE clause.
Default: WITH CASE
WITH DIACRITICAL | WITHOUT DIACRITICAL
Indicates whether to ignore diacritical marks. For example, Latin little "e" will match Latin little "e" with acute if the WITHOUT DIACRITICAL is specified.
Default: WITH DIACRITICAL
ESCAPE escape character
Specifies an escape character, which suppresses any special meaning for the character following it, allowing the character to be entered literally. The following characters can be escaped:
• The pattern-matching characters % and _.
• The escape character itself. To enter the escape character literally, type it twice.
• The | character, which can be used to specify alternate patterns to match.
LIKE does not ignore the trailing blanks present with a char or nchar data type. If you are matching a char value (that is padded with blanks when it is inserted) or if the value has trailing blanks that were entered by the user, include these trailing blanks in your pattern. For example, if searching a char(10) column for rows that contain the name harold, specify the trailing blanks in the pattern to be matched:
name like 'harold '
Four blanks are added to the pattern after the name to include the trailing blanks.
Because blanks are not significant when performing comparisons of c data types, the LIKE predicate returns a correct result whether or not trailing blanks are included in the pattern.
LIKE Examples
The following examples illustrate some uses of the pattern-matching capabilities of the LIKE predicate.
To match any string starting with a:
name LIKE 'a%'
To match any string starting with A through Z:
name LIKE '\[A-Z\]%' ESCAPE '\'
To match any two characters followed by 25%:
name LIKE '__25\%' ESCAPE '\'
To match a string starting with a backslash. Because there is no ESCAPE clause, the backslash is taken literally:
name LIKE '\%'
To match a string starting with a backslash and ending with a percent:
name LIKE '\\%\%' ESCAPE '\'
To match any string starting with 0 through 4, followed by an uppercase letter, then a [, any two characters and a final ]:
name LIKE '\[01234\]\[A-Z\][__]' ESCAPE '\'
To detect names that start with S and end with h, disregarding any leading or trailing spaces:
SQUEEZE(name) LIKE 'S%h'
To detect a single quote, repeat the quote:
name LIKE ''''
To search for multiple patterns, use escaped | as a delimiter. For example, the following will match if string_1 contains ABC, 123, or xyz:
string_1 LIKE '%ABC%@|%123%@|%xyz%' ESCAPE '@'
BEGINNING, CONTAINING, and ENDING Predicates
Use BEGINNING, CONTAINING, or ENDING in a WHERE clause to search for a specified pattern in a column.
These predicates have the following syntax:
expression [NOT] BEGINNING | CONTAINING | ENDING pattern
[WITH CASE | WITHOUT CASE] [ESCAPE escape_character]
where
expression
Is a column name or an expression containing string functions.
BEGINNING
Matches if the pattern string is found at the beginning of the text.
CONTAINING
Matches if the pattern string is found within the text.
ENDING
Matches if the pattern string is found at the end of the text.
pattern
Specifies the pattern to be matched. The pattern must be a string literal.
SIMILAR TO Predicate
Use SIMILAR TO in a WHERE clause to search for a regular expression pattern in a column.
The SIMILAR TO predicate has the following syntax:
expression [NOT] SIMILAR TO pattern [WITH CASE | WITHOUT CASE]
[ESCAPE escape_character]
where
expression
Is a column name or an expression containing string functions
pattern
Specifies the pattern to be matched.
The pattern operators are:
literal
Matches itself.
[ … ]
Matches a set of characters.
A character set expression can contain a list of named character classes, individual characters, or inclusive ranges specified with a hyphen (-). Character class names are enclosed with [: and :].
Supported class names are: ALNUM, ALPHA, ASCII, BLANK, CNTRL, DIGIT, GRAPH, LOWER, PRINT, PUNCT, SPACE, UPPER, WHITESPACE, XDIGIT.
ALNUM
Alphanumeric characters
ALPHA
Alphabetic characters
ASCII
Characters in the range 0x01 to 0x7F
BLANK
The space and horizontal tab characters
CNTRL
Control characters: characters in the range 0x01 to 0x1F and character 0x7F
DIGIT
Digits
GRAPH
Visible ASCII characters excluding whitespace and control characters: characters in the range 0x21 to 0x7E
LOWER
Lowercase letters
PRINT
Visible ASCII characters and the space character: characters in the range 0x20 to 0x7E
PUNCT
Punctuation marks and symbols: comma, period, double and single quote characters, and the characters !, #, $, &, <, =, >, @, /, :, ;, `, ~
SPACE
Whitespace characters
UPPER
Uppercase letters
WHITESPACE
Whitespace control characters: space, form feed, newline, carriage return, tab, vertical tab, ellipsis, next line, separator
XDIGIT
Hexadecimal digits: characters 0 to 9, A to F, and a to f
[ … ^ … ]
Matches one character in the set preceding the ^ unless de-selected by the set after it. To abbreviate a range, use a hyphen (-). For example: [[:LOWER:]^aeiou] matches the lower case consonants.
_ (Underscore)
Matches any single character.
%
Matches zero or more characters.
\ …
Escapes the character that follows if it is a meta character.
( … )
Treats pattern elements as one unit.
… | …
Matches one of the alternates.
Suffix Operators-The following operators can be appended to the end of the preceding operators for further modification:
{n}
Matches n occurrences of the preceding pattern element
{n,}
Matches n or more occurrences of the preceding pattern element
{n,m}
Matches between n and m occurrences of the preceding pattern element
?
Matches 0 or 1 occurrences of the preceding pattern element. Shorthand for {0,1}
*
Matches 0 or more occurrences of the preceding pattern element. Shorthand for {0,}
+
Matches 1 or more occurrences of the preceding pattern element. Shorthand for {1,}
SIMILAR TO Examples
The SIMILAR TO predicate returns values depending on whether the pattern matches the expression. It functions similarly to the LIKE predicate.
As with LIKE, the SIMILAR TO operator succeeds only if its pattern matches the entire string. This is unlike common regular expression practice, wherein the pattern can match any part of the string. Like LIKE, SIMILAR TO uses the wildcard characters _ (denoting a single character) and % (denoting a string).
The SQL standard SIMILAR TO operator supports sets and has a negation operator that is more powerful.
To match only consonants with LIKE it is necessary to declare:
[BCDFGHJKLMNPQRSTVWXYZbcdfghjklmnpqrstvwxyz]
whereas with SIMILAR TO, it can be reduced to:
[A-Za-z^AEIOUaeiou]
where ^ means NOT THESE VALUES.
To match with strings like ii/II, iI, Ii you could code:
string_1 SIMILAR TO '[^iI]{2}%'
where:
The ^ inside the [ ] denotes neither i nor I. The number inside the { } denotes two of the characters inside the [ ].
WITH CASE and WITHOUT CASE controls case sensitivity.
The above expression:
string_1 SIMILAR TO '[^iI]{2}%'
can be written more concisely as:
string_1 NOT SIMILAR TO 'ii%' WITHOUT CASE
To match a more complex pattern such as a telephone number, use code like this:
string_1 SIMILAR TO '(+[0-9]{2})?([- ]*[0-9]{3,4}){2,3}'
The ‘(+[0-9]{2})?’ matches an optional country code, followed by from two to three ‘(…){2,3}’ space- or hyphen-separated sets of three to four digit numbers ‘[- ]*[0-9]{3,4}’.
LIKE_REGEX Predicate
Use LIKE_REGEX in a WHERE clause to search for a regular expression pattern in a column.
The LIKE_REGEX predicate has the following syntax:
expression [NOT] LIKE_REGEX pattern [FLAG flag]
where
expression
Is a column name or an expression containing string functions
pattern
Specifies the pattern to be matched
flag (optional)
Specifies the additional options for pattern matching
The pattern-matching (wildcard) characters are as follows:
literal
Matches itself
[ … ]
Matches a set of characters. A character set expression can contain a list of named character classes, individual characters, or inclusive ranges specified with a hyphen (-)
Note: '[' and ']' are not valid character ranges, where as '-' is a valid character range only at the beginning or end of a positive character group.
[ … ]-[ … ]
Matches one character in the first set unless de-selected by the set after it. To abbreviate a range, use a hyphen (-). For example: [a-z]-[aeiou] matches the lower case consonants.
.
Matches any single character
.*
Matches zero or more characters
\ …
Escapes the character that follows if it is a meta character
( … )
Treats pattern elements as one unit
^
Negates the character set when placed in the first position of the set
Note: It is a meta character only at the beginning of a positive character set.
Suffix Operators - The following operators can be appended to the end of the preceding operators for further modification:
{n}
Matches n occurrences of the preceding pattern element
{n,}
Matches n or more occurrences of the preceding pattern element
{n,m}
Matches between n and m occurrences of the preceding pattern element
{0,0}
The empty string
?
Matches 0 or 1 occurrence of the preceding pattern element. Shorthand for {0,1}
*
Matches 0 or more occurrences of the preceding pattern element. Shorthand for {0,}
+
Matches 1 or more occurrences of the preceding pattern element. Shorthand for {1,}
Properties - The set containing all characters that have property X, can be identified with a category escape \p{X}. Complement is \P{X}. There are seven types of properties:
• letters
• numbers
• marks
• punctuation
• separators
• symbols and
• other
Each property has a corresponding property list.
Multi-character escape
\s
Matches one of the whitespaces: ' ' (space character), '\t' (tab character), '\n' (new line character), '\r' (return character)
\S
Matches any character not matched by '\s'
\i
Matches a letter, '_' or ':'
\I
Matches any character not matched by '\i'
\c
Matches a letter, digit, '.', '-', '_' or ':'
\C
Matches any character not matched by '\c'
\d
Matches any digit character
\D
Matches any non-digit character
\w
Matches any character except a character part of "punctuation", "separator" and "other"
\W
Matches any character which is part of "punctuation", "separator" and "other"
Flags
s
If present, the match operates in "dot-all" mode. If the s flag is not specified, the meta character matches any character except a newline (#x0A) or carriage return (#x0D) character.
m
If present, the match operates in multi-line mode. By default, the meta character ^ matches the start of the entire string, while $ matches the end of the entire string. In multi-line mode, ^ matches the start of any line (i.e. the start of the entire string, and the position immediately after a newline character other than a newline that appears as the last character in the string), while $ matches the end of any line (that is, the position immediately before a newline character, and the end of the entire string if there is no newline character at the end of the string). Newline here means the character #x0A only.
i
If present, the match operates in case-insensitive mode.
x
If present, whitespace characters (#x9, #xA, #xD and #x20) in the regular expression are removed before matching with the exception that whitespace characters within character groups are not removed.
Examples:
"helloworld" LIKE_REGEX "hello world" FLAG "x" → returns true()
"helloworld" LIKE_REGEX "hello[ ]world" FLAG "x" → returns false()
"hello world" LIKE_REGEX "hello\ sworld" FLAG "x" → returns true()
"hello world" LIKE_REGEX "hello world" FLAG "x" → returns false()
q
If present, all characters in the regular expression are treated as representing themselves and not as meta-characters.
Examples:
"abcd" LIKE_REGEX ".*" FLAG "q" → returns false()
"Mr. B. Obama" LIKE_REGEX "B. OBAMA" FLAG "iq" → returns true()