4. Understanding the Elements of SQL Statements : Predicates in SQL : Pattern-matching Predicates : SIMILAR TO Predicate
 
Share this page                  
SIMILAR TO Predicate
Use SIMILAR TO in a WHERE clause to search for a regular expression pattern in a column.
The SIMILAR TO predicate has the following syntax:
expression [NOT] SIMILAR TO pattern [WITH CASE | WITHOUT CASE]
           [ESCAPE escape_character]
where
expression
Is a column name or an expression containing string functions
pattern
Specifies the pattern to be matched.
The pattern operators are:
literal
Matches itself.
[ … ]
Matches a set of characters.
A character set expression can contain a list of named character classes, individual characters, or inclusive ranges specified with a hyphen (-). Character class names are enclosed with [: and :]. For example: [[:upper:]ab] matches the uppercase and lowercase characters a and b.
Supported class names are: ALNUM, ALPHA, ASCII, BLANK, CNTRL, DIGIT, GRAPH, LOWER, PRINT, PUNCT, SPACE, UPPER, WHITESPACE, XDIGIT.
ALNUM
Alphanumeric characters
ALPHA
Alphabetic characters
ASCII
Characters in the range 0x01 to 0x7F
BLANK
The space and horizontal tab characters
CNTRL
Control characters: characters in the range 0x01 to 0x1F and character 0x7F
DIGIT
Digits
GRAPH
Visible ASCII characters excluding whitespace and control characters: characters in the range 0x21 to 0x7E
LOWER
Lowercase letters
PRINT
Visible ASCII characters and the space character: characters in the range 0x20 to 0x7E
PUNCT
Punctuation marks and symbols: comma, period, double and single quote characters, and the characters !, #, $, &, <, =, >, @, /, :, ;, `, ~
SPACE
Whitespace characters
UPPER
Uppercase letters
WHITESPACE
Whitespace control characters: space, form feed, newline, carriage return, tab, vertical tab, ellipsis, next line, separator
XDIGIT
Hexadecimal digits: characters 0 to 9, A to F, and a to f
[ … ^ … ]
Matches one character in the set preceding the ^ unless de-selected by the set after it. To abbreviate a range, use a hyphen (-). For example: [[:LOWER:]^aeiou] matches the lower case consonants.
_
Matches any single character.
%
Matches zero or more characters.
\ …
Escapes the character that follows if it is a meta character.
( … )
Treats pattern elements as one unit.
… | …
Matches one of the alternates.
Suffix Operators--The following operators can be appended to the end of the preceding operators for further modification:
{n}
Matches n occurrences of the preceding pattern element.
{n,}
Matches n or more occurrences of the preceding pattern element.
{n,m}
Matches between n and m occurrences of the preceding pattern element.
?
Matches 0 or 1 occurrences of the preceding pattern element. Shorthand for {1,0}.
*
Matches 0 or more occurrences of the preceding pattern element. Shorthand for {0,}.
+
Matches 1 or more occurrences of the preceding pattern element. Shorthand for {1,}.
SIMILAR TO Examples
The SIMILAR TO predicate returns values depending on whether the pattern matches the expression. It functions similarly to the LIKE predicate.
As with LIKE, the SIMILAR TO operator succeeds only if its pattern matches the entire string. This is unlike common regular expression practice, wherein the pattern can match any part of the string. Like LIKE, SIMILAR TO uses the wildcard characters _ (denoting a single character) and % ( denoting a string).
The SQL standard SIMILAR TO operator supports sets and has a negation operator that is more powerful.
To match only consonants with LIKE it is necessary to declare:
[BCDFGHJKLMNPQRSTVWXYZbcdfghjklmnpqrstvwxyz]
whereas with SIMILAR TO, it can be reduced to:
[A-Za-z^AEIOUaeiou]
where ^ means NOT THESE VALUES.
To match on items that do not begin with "ii" you could code:
string_1 SIMILAR TO ‘[^iI]{2}%’
where:
The ^ inside the [ ] denotes neither i or I. The number inside the { } denotes two of the characters inside the [ ].
WITH CASE and WITHOUT CASE controls case sensitivity.
The above expression:
string_1 SIMILAR TO ‘[^iI]{2}%’
can be written more concisely as:
string_1 NOT SIMILAR TO ‘ii%’ WITHOUT CASE
To match a more complex pattern such as a telephone number, use code like this:
string_1 SIMILAR TO ‘(+[0-9]{2})?([- ]*[0-9]{3,4}){2,3}’
The ‘(+[0-9]{2})?’ matches an optional country code, followed by from two to three ‘(…){2,3}’ space- or hyphen-separated sets of three to four digit numbers ‘[- ]*[0-9]{3,4}’.