User Guide > Scripting > Writing Expressions > Regular Expressions
Was this helpful?
Regular Expressions
A regular expression is a string of characters that defines a set of rules for matching character strings. Regular expressions can match a whole string or a portion of a string.
Scripting allows you to use regular expressions with certain functions and operators. Regular expressions can specify more complex and flexible conditions for matching character strings than simply a test of "Is it the same string?"
Special Characters
When used in a function or with an operator that enables regular expressions the following characters have special meanings: "|", "(", ")", "*", "+", "?", "[", "]", "-", ".", "\", "^" and "$".
When writing regular expressions, remember these special characters and their purposes.
Character
Meaning
|
Divides an expression into branches and then the overall expression matches any string, which matches any of the branches.
For example, "Sam|Carol" matches all fields, which contain either or both of the strings "Sam" or "Carol". For information on how this character behaves differently when used with Like, see Like Operator.
(xxxx)
Characters in parenthesis are treated by the subsequent special character the same as a single character is treated.
For example, "abc*" matches "abc", abcc", and abccc", whereas (abc)* matches "abc", "abcabc", and "abcabcabc".
*
Any character or group followed by "*" matches zero or more occurrences of the previous character or group in a string.
Thus "Sam*y" matches "Say" (zero occurrences of "m"), "Samy" (one occurrence of "m"), "Sammy" (two occurrences of "m"), and "Sammmy" (three occurrences of "m"). In the same way, "Rin(tin)*" matches "Rin", "Rintin", "Rintintin", and "Rintintintin".
+
Any character or group followed by "+" matches one or more of that character or group in a string.
"Sam+y" does not match "Say" but matches "Samy", "Sammy", "Sammmy", and so on. "Rin(tin)+" does not match "Rin" but matches "Rintin", "Rintintin", "Rintintintin", and so on.
?
Any character or group followed by "?" matches either zero or one of that character or group in a string.
So, "Sam?y" matches either "Say" or "Samy" but not "Sammy". "Rin(tin)?" matches "Rin" or "Rintin".
{n,m}
Two numbers separated by a comma and enclosed in curly brackets, "{ }", can be used to specify a set or class of repetitions of the previous character or group. The first number indicates the minimum number of repetitions, and the second number indicates the maximum. The second number is optional. If it is omitted, there is no upper limit for repetitions.
For example, "Sam{0,2}y" matches "Samy", "Sammy" but not "Sammmy". "Rin(tin){2}" does not match "Rintin" but matches "Rintintin" and "Rintintintin".
[xxxx]
A sequence of characters enclosed in square brackets, "[ ]", constitutes a set. It normally matches any character within the sequence. Thus "Pa[tml]" matches "Pat", "Pam", or "Pal". If the sequence begins with a '^', the set or class matches any character not in the sequence.
Thus "Pa[^tml]" matches any three-character sequence starting with "Pa" and ending with any character except 't', 'm', 'l'.
[x-y]
Two characters separated by '-' represents a set or class, or a full list of ASCII characters between them. Thus, "Pa[0-9]" matches any three-character string starting with "Pa" and ending with a digit. "Pa[^0-9]" matches any three-character string starting with "Pa" and ending with anything except a digit.
You can use a literal hyphen, caret, or square bracket in a set or class. The carat is treated as special only if it is the first character after the left bracket (in any other place it is treated literally). The hyphen is treated literally if it is either the first character (or second, if the carat is the first character), or last character in the set. The right bracket is treated literally (not special) if it is the first character (or second, if the carat is the first character) in the set.
- (hyphen)
This character has no special meaning outside of a set or a class.
. (period)
This character matches any single character.
Thus "S.m" matches "Sam", "Sbm", "Scm", etc. A period can be followed by "*" to mean zero or more occurrences of any character, by "+" to mean one or more occurrences of any character and by "?" to means zero or one occurrences of any character.
\
A backslash followed by any character matches that literal character.
For example, "Sam\." matches "Sam." unlike the expression "Sam.", which matches any four character string starting with "Sam" because '.' matches any character. To use a literal backslash character, enter it twice: "\\".
^
The caret matches the beginning of a field.
Thus "^Sam" matches the string "Sam" only when it is at the beginning of a field. For example, "^am" matches "amplitude" but not "example".
$
The dollar sign matches the end of a field.
Thus "Sam$" matches the string "Sam" only when it is at the end of a field. For example "am$" matches "slam" but not "tramp". To match a literal dollar character, use "\$". To match a blank string, use "^$".
Note:  The use of back references in regular expressions is not permitted.
Literal Values of Special Characters
To use the literal value of a special character within a regular expression, you must precede the special character with a backslash: "\". For example, to enter a literal backslash, you must type it twice "\\". To enter a literal dollar sign, you must type backslash and then dollar sign: "\$".
Set off regular expressions from the rest of the code with quotes. This enables you to use variables, such as field names, and the Chr() function inside the regular expression.
It may not always be clear whether you are using a regular expression or a literal string. The way to tell which is being used is by the context. Regular expressions can only be used in the context of the Sub() and Gsub() functions and the "~", "!~", and Like operators.
Example 1
Compare:
If FieldAt("/SOURCE/R1/Field1") == "top"
with:
If FieldAt("/SOURCE/R1/Field1") ~ "top"
The "==" statement is only true if Field1 is "top".
The "~" statement is true if Field1 contains the string "top", so "top", "stop", and "topic" are all true.
Example 2
Compare:
StrReplace("Mr.", "Mister", FieldAt("/SOURCE/R1/Field1"))
with:
GSub("Mr.", "Mister", Fields("/SOURCE/R1/Field1"))
These two expressions have identical syntax, but are different functions. The GSub() function recognizes special characters. StrReplace() does not.
The StrReplace() expression finds the literal string abbreviation "Mr." in Field1 and replaces it with the full word "Mister".
Because "." is a special character in regular expressions, the GSub() expression replaces any string in Field1 that starts with "Mr" and has one more character after it, such as "Mrs" or "Mrk", with the word "Mister".
Functions and Operators
The following functions support the use of regular expressions:
GSub Function (globally substitute)
Sub Function (substitute)
The following operators allow the use of regular expressions:
~ Operator (contains)
!~ Operator (does not contain)
Like Operator - supports a limited set only
Escaping Hex Values
Example 1
This example checks to see if the field F1 contains, at the beginning of the field, an empty square symbol.
FieldAt("/SOURCE/R1/F1") ~ "^\xFF"
The "\x" before the hex value "FF" indicates that the following two characters are a hex value and should be evaluated together.
Example 2
You might attempt to use this expression to look for a square bracket ( [ ):
FieldAt("/SOURCE/R1/F1") ~ "\x5b"
However, this expression generates an error, because the hex escape is processed BEFORE the entire expression is evaluated. Since the square bracket is a special character, a matching bracket must follow it. If there is no closing bracket, an error occurs.
Caution!  You cannot use the hex value for a character that is already a special character.
Example 3
This expression looks for a lowercase y.
FieldAt("/SOURCE/R1/F1") ~ "\x79"
To learn more about regular expressions, see http://www.regular-expressions.info/.
Last modified date: 12/03/2024