Was this helpful?
Profiling Rules
Profiling rules can help you identify problems in source data. These rules are of the following types:
Some rules can be used to generate aggregate statistics, which help identify inaccuracies by examining aggregated values over large datasets.
Other rules are test rules that generate pass and fail statistics. Some of these rules also provide a method to invert the results into pass and fail files.
See Add New Rule for information about adding rules.
The following is a list of profiling rules:
Rule Name
Description
Provides users the ability to evaluate multiple fields and conditions within a single rule and it ensures that a specific relationship or calculation between fields is met.
Evaluates a field by comparing its value to a specified constant using the selected operator, and returns true or false based on the result.
Compares the value of one field with another field using a specified operator, and returns true or false based on the result.
Checks whether a field contains a value and returns true if a value is present.
Checks whether the field value is unique within the dataset and returns true if no duplicate value exists, otherwise false.
Checks whether a field value is not null and returns true or false based on the result.
Checks whether a field value falls within a specified range.
Checks whether a field matches a regex expression. Returns true if the value matches the specified regex expression.
Checks whether a field contains a valid currency value. Returns true if the value matches any of the selected/specified identifiers/patterns.
Checks whether a field contains a valid date value. Returns true if the value matches any of the selected/specified formats/patterns.
Checks whether a field contains a valid email address. Returns true if the value matches any of the selected/specified formats/patterns.
Checks whether a field contains a valid phone number. Returns true if the value matches any of the selected/specified formats/patterns.
Checks whether a field contains a PO Box address. Returns true if the value matches any of the selected/specified formats/patterns.
Checks whether a field contains a valid postal/ZIP code. Returns true if the value matches any of the selected/specified formats/patterns.
Checks whether a field contains a valid US State.
Checks whether a field contains a valid time value. Returns true if the value matches any of the selected/specified formats/patterns.
Assert
This rule provides users the ability to evaluate multiple fields and conditions within a single rule and it ensures that a specific relationship or calculation between fields is met. Essentially, Assert compares the left side (actual value) with the right side (expected value) to see if they match.
Rule Properties
Rule Name
A default rule name (Assert or Assert_n, where “n” is 1,2,3, and so on) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name.
Note:  The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field.
Field Name
This is shown as empty as this rule can be used to evaluate multiple fields.
Rule Type
The type of rule that is applied to the field. That is Assert (Test rule).
Tip...  A Red Cross in a rule icon (for example ) indicates that the rule has a parameter that has not been defined.
Rule Parameters
Invert Test
Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test.
Script (Left Expression)
The left expression is always a script and can be one or more valid expressions supported by the ExecuteExpression rule.
Click Build to open the Expression Builder dialog to assist you in building the SQL script.
For example, to add multiple regular expressions use the RLIKE operator:
CASE
  WHEN field RLIKE 'pattern1' THEN true
  WHEN field RLIKE 'pattern2' THEN true
  WHEN field RLIKE 'pattern3' THEN true
  ELSE false
END
Operator
The operator that compares the left side (actual value) with the right side (expected value).
Select from one of the following comparison operators:
Equal - Checks if the left side value is equal to the right side value.
Greater - Checks if the left side value is greater than the right side value.
Greater or Equal - Checks if the left side value is greater than or equal to the right side value.
Lesser - Checks if the left side value is lesser than the right side value.
Lesser or Equal - Checks if the left side value is lesser than or equal to the right side value.
Not Equal - Checks if the left side value is not equal to the right side value.
StartsWith - Checks if the left side value starts with the right side value.
EndsWith - Checks whether the left side value ends with the right side value.
Matches - Checks whether the left side value matches the right side value (uses regex match).
Expression Type (Right Expression)
This drop-down displays options for the type of right hand side expression. The available choices depend on the selection made in the Operator field.
The following options are available:
Expression - Compare the left expression against a right expression. The right expression can be specified in the Script box. This option is available for the Equal, Greater, Greater or Equal, Lesser, Lesser or Equal, Not Equal, StartsWith, and EndsWith operators.
Constant - Compare the left expression against a list of constants. You can type one or more constant values to compare and then press Enter or click Add to apply it. This option is available for the Equal, Greater, Greater or Equal, Lesser, Lesser or Equal, Not Equal, StartsWith, and EndsWith operators.
Field - Compare the left expression against a specific field. You can specify only one field using the Compare Field drop-down. This option is available for the Equal, Greater, Greater or Equal, Lesser, Lesser or Equal, Not Equal, StartsWith, and EndsWith operators.
Lookup - Compare the left side value with a looked up value. You can specify the Lookup value using the Lookup widget (see LookupValue rule). This option is available for the Equal, Greater, Greater or Equal, Lesser, Lesser or Equal, Not Equal, StartsWith, and EndsWith operators.
NotBlank - Check if the left side value is not blank. This option is available only when the Operator is specified as Equal.
NotNull - Check if the left expression value is not null. This option is available only when the Operator is specified as Equal.
Script (Right Expression)
The right expression in some cases is a script and can be any valid expression supported by the ExecuteExpression rule.
Click Build to open the Expression Builder dialog which helps you to construct SQL scripts. This dialog provides a list of available fields, operators, and functions you can use to build expressions. See Using the Expression Builder.
Regular Expression (Right Expression)
The right expression when the Operator (that compares the left side of the expression with the right side) is specified as Matches. In this case the Expression Type is set to Regular Expression and cannot be changed.
You can specify a regular expression (Regex) pattern here. See Java Regular Expressions.
Compare Field (Right Expression)
Used to specify a field for comparison, when the Expression Type is Field. This drop-down is displayed only when the Expression Type is Field.
Constant box (Right Expression)
Used to specify a list of constants for comparison, when the Expression Type is Constant. This text box is displayed only when the Expression Type is Constant.
You can type one or more constant values to compare and then press Enter or click Add to apply it.
Lookup widget (Right Expression)
Used to specify the Lookup value for comparison, when the Expression Type is Lookup. The Lookup Widget is displayed only when the Expression Type is Lookup.
To learn how to use the lookup widget, see LookupValue rule.
Dimension
(Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension.
A dimension represents a characteristic of data quality:
Accuracy - The data is correct.
Completeness - The data is present.
Consistency - The data uses the same format or pattern across different sources.
Timeliness - The data is recent and available.
Uniqueness - The data is not duplicated.
Validity - The data conforms to business rules and is within an acceptable range.
Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics.
For more information about dimensions, see Add New Rule.
For information about managing dimensions, see Managing Data Quality Dimensions.
Weight
(Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension).
For more information, see Add New Rule.
Supported Data Types
String, Date, Time, Timestamp, Boolean, Numeric, Double, Float, Long, Integer
Remarks
For Date, Time, or Timestamp values, use the ISO 8601 format for constants. For boolean values, use true or false.
 
CompareToConstant
This rule evaluates a field by comparing its value to a specified constant using the selected operator, and returns true or false based on the result.
Rule Properties
Rule Name
A default rule name (<FieldName>_CompareToConstant) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name.
Note:  The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field.
Field Name
The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, myField (String).
Rule Type
The type of rule that is applied to the field. That is CompareToConstant (Test rule).
Tip...  A Red Cross in a rule icon (for example ) indicates that the rule has a parameter that has not been defined.
Rule Parameters
Invert Test
Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test.
Operator
Select an operator that determines how the field value is compared against the constant (e.g., Equal, Greater, Not Equal).
Select from one of the following comparison operators:
Equal
Greater
Greater or Equal
Lesser
Lesser or Equal
Not Equal
Constant
Type the constant value that is used as the reference in the comparison against the field value, and then press Enter or click Add to apply it. For equal or not equal you can specify one or more constants.
Dimension
(Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension.
A dimension represents a characteristic of data quality:
Accuracy - The data is correct.
Completeness - The data is present.
Consistency - The data uses the same format or pattern across different sources.
Timeliness - The data is recent and available.
Uniqueness - The data is not duplicated.
Validity - The data conforms to business rules and is within an acceptable range.
Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics.
For more information about dimensions, see Add New Rule.
For information about managing dimensions, see Managing Data Quality Dimensions.
Weight
(Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension).
For more information, see Add New Rule.
Supported Data Types
String, Date, Timestamp, Boolean, Numeric, Double, Float, Long, Integer
Remarks
The constant must be of the same data type as the profiled field. The equal operator is not recommended to use on double or float field comparisons. Do not use quotation marks with string data.
The following formats are supported for date and timestamp:
yyyy-mm-dd
yyyy-mm-dd hh:mm:ss (military time)
Note:  If source has GMT timestamp data, the constant value should end with 'Z'.
CompareToField
This rule compares the value of one field with another field using a specified operator, and returns true or false based on the result.
Rule Properties
Rule Name
A default rule name (<FieldName>_CompareToField) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name.
Note:  The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field.
Field Name
The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, myField (String).
Rule Type
The type of rule that is applied to the field. That is CompareToField (Test rule).
Rule Parameters
Compare Field
The second field whose value is used as the reference for comparison against the primary field (the field to which the rule applies).
Use Fuzzy Match
Enable to configure Fuzzy Matching which returns true if two strings match based on the specified Fuzzy Match Algorithm and match score filter.
Note:  This option supports String data types.
Operator
Select from one of the following comparison operators:
Equal
Greater
Greater or Equal
Lesser
Lesser or Equal
Not Equal
Fuzzy Match Algorithm
CONTAINS: Checks the small string in the long string.
DAMERAU–LEVENSHTEIN: This distance between two strings is the minimum number of operations (insertions, deletions or substitutions of a single character, or transposition of two adjacent characters) required to change one string into another. For more information, see Damerau–Levenshtein distance.
EXACT MATCH: Checks two strings for exact match.
JARO: A measure of characters in common, being no more than half the length of the longer string in distance with consideration for transpositions.
JARO-WINKLER: This algorithm extends Jaro algorithm and uses a prefix scale “p” which gives more favorable ratings to strings that match from the beginning for a set prefix length “l”. The prefix length is set to 4 characters (the maximum possible prefix length). For more information, see Jaro–Winkler distance.
LEVENSHTEIN: This distance between two strings is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one string into other. For more information, see Levenshtein distance.
POSITIONAL QGRAM: Positional q-grams is an extension to q-grams with an additional property called maxDistance used in calculating the string similarity. For example, ‘peter’ contains the positional bigrams (‘pe’,0), (‘et’,1), (‘te’,2) and (‘er’,3). If maxDistance is set to 1, then (‘et’,1) will only be matched to bi-grams in the other string with positions 0 to 2
QGRAM: q-grams (also called as n-grams) is the process of breaking up a string into multiple strings each with length n (q). For example, apply q-grams on abcde with q = 3 would yield abc, bcd, cde. For example, the word ‘peter’ with q=2 would yield following bigrams - ‘pe’, ‘et’, ‘te’ and ‘er’. A q-gram similarity measure between two strings is calculated by counting the number of q-grams in common.
SHORTHAND: Determine if the shorter string is a “shorthand” of the longer. That is, if it can be produced only by deletions. If the first character of a word in the longer string is deleted, the entire word is considered deleted. For e.g. IBM is a shorthand of International Business Machines.
Fuzzy Score Filter
Add a decimal values between 0.01 to 1. Default value is 0.7. Comparison score is between 0 to 1. Records with comparison score less than this value are not considered a match for probable duplicates and are discarded. The higher the value you select, more strict is the matching.
Dimension
(Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension.
A dimension represents a characteristic of data quality:
Accuracy - The data is correct.
Completeness - The data is present.
Consistency - The data uses the same format or pattern across different sources.
Timeliness - The data is recent and available.
Uniqueness - The data is not duplicated.
Validity - The data conforms to business rules and is within an acceptable range.
Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics.
For more information about dimensions, see Add New Rule.
For information about managing dimensions, see Managing Data Quality Dimensions.
Weight
(Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension).
For more information, see Add New Rule.
Invert Test
Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test.
Supported Data Types
String, Date, Timestamp, Boolean, Numeric, Double, Float, Long, Integer
Remarks
The two fields must be of the same data type. The equal operator is not recommended for use on double or float field comparisons.
IsNotBlank
This rule checks whether a field contains a value and returns true if a value is present, otherwise false.
Rule Properties
Rule Name
A default rule name (<FieldName>_IsNotBlank) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name.
Note:  The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field.
Field Name
The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, myField (String).
Rule Type
The type of rule that is applied to the field. That is IsNotBlank (Test rule).
Rule Parameter
Invert Test
Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test.
Dimension
(Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension.
A dimension represents a characteristic of data quality:
Accuracy - The data is correct.
Completeness - The data is present.
Consistency - The data uses the same format or pattern across different sources.
Timeliness - The data is recent and available.
Uniqueness - The data is not duplicated.
Validity - The data conforms to business rules and is within an acceptable range.
Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics.
For more information about dimensions, see Add New Rule.
For information about managing dimensions, see Managing Data Quality Dimensions.
Weight
(Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension).
For more information, see Add New Rule.
Supported Data Types
String
Remarks
Counts null, white space, and empty strings as blank.
 
IsNotDuplicate
This rule checks whether the field value is unique within the dataset and returns true if no duplicate value exists, otherwise false.
Rule Properties
Rule Name
A default rule name (<FieldName>_IsNotDuplicate) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name.
Note:  The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field.
Field Name
The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, myField (String).
Rule Type
The type of rule that is applied to the field. That is IsNotDuplicate (Test rule).
Rule Parameter
Dimension
(Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension.
A dimension represents a characteristic of data quality:
Accuracy - The data is correct.
Completeness - The data is present.
Consistency - The data uses the same format or pattern across different sources.
Timeliness - The data is recent and available.
Uniqueness - The data is not duplicated.
Validity - The data conforms to business rules and is within an acceptable range.
Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics.
For more information about dimensions, see Add New Rule.
For information about managing dimensions, see Managing Data Quality Dimensions.
Weight
(Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension).
For more information, see Add New Rule.
Supported Data Types
String, Date, Timestamp, Boolean, Numeric, Double, Float, Long, Integer
Remarks
None.
 
IsNotNull
This rule checks whether a field value is not null and returns true or false based on the result.
Rule Properties
Rule Name
A default rule name (<FieldName>_IsNotNull) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name.
Note:  The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field.
Field Name
The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, myField (String).
Rule Type
The type of rule that is applied to the field. That is IsNotNull (Test rule).
Rule Parameter
Invert Test
Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test.
Dimension
(Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension.
A dimension represents a characteristic of data quality:
Accuracy - The data is correct.
Completeness - The data is present.
Consistency - The data uses the same format or pattern across different sources.
Timeliness - The data is recent and available.
Uniqueness - The data is not duplicated.
Validity - The data conforms to business rules and is within an acceptable range.
Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics.
For more information about dimensions, see Add New Rule.
For information about managing dimensions, see Managing Data Quality Dimensions.
Weight
(Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension).
For more information, see Add New Rule.
Supported Data Types
String, Date, Timestamp, Boolean, Numeric, Double, Float, Long, Integer
Remarks
None
 
InRange
This rule checks whether a field value falls within a specified range (inclusive or exclusive, based on configuration) and returns true if it does, otherwise false.
Rule Properties
Rule Name
A default rule name (<FieldName>_InRange) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name.
Note:  The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field.
Field Name
The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, Count (Integer).
Rule Type
The type of rule that is applied to the field. That is InRange (Test rule).
Tip...  A Red Cross in a rule icon (for example ) indicates that the rule has a parameter that has not been defined.
Rule Parameters
Lower Bound
The minimum value of the range. Select Inclusive to include the lower bound number in the range.
Upper Bound
The maximum value of the range. Select Inclusive to include the upper bound number in the range.
Dimension
(Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension.
A dimension represents a characteristic of data quality:
Accuracy - The data is correct.
Completeness - The data is present.
Consistency - The data uses the same format or pattern across different sources.
Timeliness - The data is recent and available.
Uniqueness - The data is not duplicated.
Validity - The data conforms to business rules and is within an acceptable range.
Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics.
For more information about dimensions, see Add New Rule.
For information about managing dimensions, see Managing Data Quality Dimensions.
Weight
(Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension).
For more information, see Add New Rule.
Supported Data Types
Date, Timestamp, Numeric, Double, Float, Long, Integer
Remarks
The following formats are supported for Date and Timestamp:
yyyy-mm-dd
yyyy-mm-dd hh:mm:ss (military time).
Note:  If source has GMT timestamp data, the lower bound and upper bound values should end with 'Z'.
 
MatchesRegex
This rule returns true if a field matches a specified regular expression or pattern, returns false otherwise.
Rule Properties
Rule Name
A default rule name (<FieldName>_MatchesRegex, <FieldName>_MatchesRegex_n, where “n” is 1,2,3, and so on) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name.
Note:  The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field.
Field Name
The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, myField (String).
Rule Type
The type of rule that is applied to the field. That is MatchesRegex (Test rule).
Rule Parameters
Invert Test
Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test.
Regular Expression
Specify one or more regular expressions by typing or selecting them from the drop-down list and then clicking Add. Click to delete an expression. For information about the regular expressions available in the drop-down list, see Regular expressions available in the drop-down list.
You can also:
Copy/paste a regular expression or pattern detected by data discovery. See Field Data Discovery (also see steps, below).
Paste a macro by right-clicking in the text box and selecting the macro from the Paste Macro dialog. See Pasting Macro in Map, Profile or Process.
Create a macro by right-clicking in the text box. See Creating a Macro by Selecting a Value.
To copy/paste values detected by data discovery:
1. In the Field/Rule pane, select a field to open the Field Data Discovery pane.
2. In the Field Data Discovery pane:
Click to execute data discovery.
Select the String Patterns tab.
Right-click on the cell that contains the desired regular expression or pattern.
Do one of the following:
Select Copy Regex Pattern to copy the regular expression.
Select Copy Input Value to copy the input value.
3. In the Field/Rule pane, select the rule to open the Rule Definition pane.
4. Paste into the Regular Expression box and click Add.
5. To add another regular expression or pattern provided by data discovery, select the field again.
Dimension
(Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension.
A dimension represents a characteristic of data quality:
Accuracy - The data is correct.
Completeness - The data is present.
Consistency - The data uses the same format or pattern across different sources.
Timeliness - The data is recent and available.
Uniqueness - The data is not duplicated.
Validity - The data conforms to business rules and is within an acceptable range.
Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics.
For more information about dimensions, see Add New Rule.
For information about managing dimensions, see Managing Data Quality Dimensions.
Weight
(Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension).
For more information, see Add New Rule.
Regular expressions available in the drop-down list
AlphaNumeric Strings -->^[a-zA-Z0-9]+$
AnyString-->.*
ASCII characters only-->^([a-zA-Z0-9!\"#$%&',/:;<=>@_`|~ \\(\\)\\*\\+\\-\\.\\?\\[\\]\\\\\\^\\{\\}])*$")
Canada Postal Code-->^([A-Za-z]\d[A-Za-z][\s-]?\d[A-Zaz]\d)
Credit Card Number-->[1-9][0-9]{3} [0-9]{4} [0-9]{4} [0-9]{4}
Date - mm/dd/yyyy format-->^(3[0-1]|2[0-9]|1[0-9]|0[1-9])[\s{1}|\/|-](Jan|JAN|Feb|FEB|Mar|MAR|Apr|APR|May|MAY|Jun|JUN|Jul|JUL|Aug|AUG|Sep|SEP|Oct|OCT|Nov|NOV|Dec|DEC)[\s{1}|\/|-]\d{4}$
Date - YYMMDD format-->^((\d{2}((0[13578]|1[02])(0[1-9]|[12]\d|3[01])|(0[13456789]|1[012])(0[1-9] |[12]\d|30)|02(0[1-9]|1\d|2[0-8])))|([02468][048]|[13579][26])0229)$
Days of Week-->^(Sun|Mon|(T(ues|hurs))|Fri)(day|\.)?$|Wed(\.|nesday)?$|Sat(\.|urday)?$|T ((ue?)|(hu?r?))\.?$
Digits Only-->\d+
Email Address-->^(([A-Za-z0-9]+_+)|([A-Za-z0-9]+\\-+)|([A-Za-z0-9]+\\.+)|([A-Za-z0-9]+\\++))*[A-Za-z0-9]+@((\\w+\\-+)|(\\w+\\.))*\\w{1,63}\\.[a-zAZ]{2,6}$
ISBN_10 Number-->ISBN\x20(?=.{13}$)\d{1,5}([-])\d{1,7}\1\d{1,6}\1(\d|X)$
Mr or Mrs-->Mr|Mrs
Printable Characters-->^([a-zA-Z0-9!@#$%^&amp;amp;*()-_=+;:'&amp;quot;|~`&amp;lt;&amp;gt;?/{}] {1,5})$
Single Alphabetic character-->^[a-zA-Z]$
Social Security Number-->^[0-9]{3}-?[0-9]{2}-?[0-9]{4}$
Time - hh:mm AMPM format-->^ *(1[0-2]|[1-9]):[0-5][0-9]*(a|p|A|P)(m|M) *$
Time - hh:mm:ss format-->(([0-1][0-9])|([2][0-3])):([0-5][0-9]):([0-5][0-9])
UK Postal Code-->^([A-Z]{1,2}[0-9]{1,2}|[A-Z]{3}|[A-Z]{1,2}[0-9][A-Z])( |-)[0-9][A-Z]{2}
US Currency-->^\$?( )*\d*(.\d{1,2})?$
US Phone Number-->^[01]?[- .]?\(?[2-9]\d{2}\)?[- .]?\d{3}[- .]?\d{4}$
US Zipcode or Zip+4-->[0-9]{5}(-[0-9]{4})?
Vehicle Identification Number (VIN)-->^(([a-h,A-H,j-n,J-N,p-z,P-Z,0-9]{9})([a-h,A-H,j-n,J-N,p,P,r-t,R-T,v-z,V-Z,0-9])([a -h,A-H,j-n,J-N,p-z,P-Z,0-9])(\d{6}))$
AnyString is the default regular expression. For a description of Regular Expression options, see Remarks (below).
Supported Data Types
String
Remarks
Regex pattern rule is displayed in the Parameters column on the Rules tab and the Description column is displayed on the Result summary. Actian DataConnect stores and uses regular expressions and you can edit the expressions if required.
Description of Regular Expressions:
AlphaNumeric - Matches strings that contain character and/or numeric data.
AnyString - Matches all strings.
ASCII characters only Postal- Matches ASCII characters only postal.
Canada Postal Code - Matches postal code for Canada.
Credit Card Number - Matches a credit card number.
Date - mm/dd/yyyy format - Flexible date validator that matches most date formats.
Date_YYMMDD - Date expression validator with format YYMMDD.
Days of Week - Matches days of the week or their abbreviations.
DigitsOnly - Expression for matching one or more digits.
EmailAddress - Matches an email address.
ISBN_10 Number - Matches format of an ISBN number.
Mr or Mrs - Matches Mr or Mrs characters.
Printable Characters - Matches all printable characters.
Single Alphabetic character - Matches a single alphabetic Character.
Social Security Number - Matches U.S. social security number. Dashes (-) are optional.
Time - hh:mm AMPM format - Matches time format hh/mm AM.
Time - hh:mm:ss format - Matches time format hh:mm:ss.
US Zipcode or Zip+4 - Matches a US zip code.
Vehicle Identification Number (VIN) - Matches US Vehicle Identification Numbers (VINs).
ValidateCurrency
This rule checks whether a field contains a valid currency value. You can validate against currency symbols (e.g., $), ISO codes (e.g., USD), or define custom regular expressions. Returns true if the value includes any of the selected identifiers.
Note:  Data Profiler uses the International Components for Unicode for Java (ICU4J), an open-source library from the Unicode Consortium which provides internationalization and localization support, including Unicode text processing and locale-aware formatting. For more information, see ICU4J documentation at https://unicode-org.github.io/icu/userguide/icu4j/.
Rule Properties
Rule Name
A default rule name (<FieldName>ValidateCurrency) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name.
Note:  The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field.
Field Name
The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, Currency (String).
Rule Type
The type of rule that is applied to the field. That is ValidateCurrency (Test rule).
Rule Parameters
Use Custom Regex
When enabled, hides the Currency Identifiers parameter and displays the Regex Patterns option—allowing you to either select Currency Identifiers or specify the custom Regex Patterns, but not both.
Currency Identifiers
Allows to specify one or more Currency Identifiers that your source data uses. Values matching any selected identifier will be converted. This option is hidden when Use Custom Regex is selected.
 
A default value may be displayed. This default value is coming from the Data Profile preference settings and can be customized (see Setting Data Profile Preferences).
 
If you do not wish to use the default, type a Currency Identifier or select one from the available list, and then press Enter or click Add to apply it. Multiple identifiers can be added. To remove an added identifier click the icon that is displayed next to the identifier.
 
Note:  A currency identifier is a standardized code, symbol used to recognize, classify, and verify money, commonly known as ISO 4217 codes (e.g., USD, EUR, INR) for international exchange.
Regex Patterns
If you do not want to use Currency Identifiers, select “Use Custom Regex”. This will show the Regex Patterns option, where you can enter your own pattern to match your requirement or pick one from the available list. Press Enter or click Add to apply it. Multiple patterns can be added. To remove an added pattern click the icon that is displayed next to the pattern.
 
Select the pattern to apply:
USD dollar sign | $1,234.56, $100, $0.99
EUR euro sign | €1.234,56, €100, €0,99
GBP pound sign | £1,234.56, £100, £0.99
JPY/CNY yen/yuan sign | ¥1,234, ¥ 100
INR rupee sign | 1,23,456.78, 100
ISO code prefix | USD 1,234.56, EUR 100
Any major currency symbol prefix | $100, €50, £75
Flexible with negatives | -$100, $1,234.56, ($50.00)
 
Custom Regular Expression Examples
^\$?\d{1,3}(,\d{3})*(\.\d{2})?$
Accepts a numeric monetary value with an optional leading dollar sign ($), an integer part that may include comma thousands separators, and an optional fractional part consisting of exactly two decimal places, using a period as the decimal separator.
 
^(USD|EUR|GBP)\s?\d+(\.\d{2})?$
Accepts a currency code (USD, EUR, or GBP), followed by an optional single space, then a numeric amount consisting of one or more digits, with an optional fractional part of exactly two decimal places (preceded by a decimal point).
Tip...  You can copy/paste Regex Patterns from the Field Data Discovery String Patterns tab.
Invert Test
Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test.
Dimension
(Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension.
A dimension represents a characteristic of data quality:
Accuracy - The data is correct.
Completeness - The data is present.
Consistency - The data uses the same format or pattern across different sources.
Timeliness - The data is recent and available.
Uniqueness - The data is not duplicated.
Validity - The data conforms to business rules and is within an acceptable range.
Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics.
For more information about dimensions, see Add New Rule.
For information about managing dimensions, see Managing Data Quality Dimensions.
Weight
(Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension).
For more information, see Add New Rule.
Supported Data Types
String
Remarks
None
ValidateDateFormat
This rule checks whether a field contains a valid date value. Choose from standard formats (like MM/dd/yyyy or ISO) or define your own using a custom regex. Returns true if the value matches any of the selected formats.
Note:  Data Profiler uses the International Components for Unicode for Java (ICU4J), an open-source library from the Unicode Consortium which provides internationalization and localization support, including Unicode text processing and locale-aware formatting. For more information, see ICU4J documentation at https://unicode-org.github.io/icu/userguide/icu4j/.
Rule Properties
Rule Name
A default rule name (<FieldName>ValidateDateFormat) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name.
Note:  The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field.
Field Name
The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, Date (String).
Rule Type
The type of rule that is applied to the field. That is ValidateDateFormat (Test rule).
Rule Parameters
Use Custom Regex
When enabled, hides the Date Format Patterns parameter and displays the Regex Patterns option—allowing you to either select Date Format Patterns or specify the custom Regex Patterns, but not both.
Date Format Patterns
A default value may be displayed. This default value is coming from the Data Profile preference settings and can be customized (see Setting Data Profile Preferences).
 
If you do not wish to use the default, type a Date Format Pattern or select one from the available list, and then press Enter or click Add to apply it. Multiple formats can be added. To remove an added format click the icon that is displayed next to the pattern.
 
Select a format to apply:
MM/dd/yyyy | strict (e.g., 01/15/2025)
M/d/yyyy | lenient (e.g., 1/5/2025 or 01/15/2025)
dd/MM/yyyy | strict (e.g., 15/01/2025)
d/M/yyyy | lenient (e.g., 5/1/2025 or 15/01/2025)
yyyy-MM-dd | ISO strict (e.g., 2025-01-15)
dd-MMM-yyyy | strict day (e.g., 15-Jan-2025)
d-MMM-yyyy | lenient day (e.g., 5-Jan-2025 or 15-Jan-2025)
MMM dd, yyyy | strict day (e.g., Jan 15, 2025)
MMM d, yyyy | lenient day (e.g., Jan 5, 2025 or Jan 15, 2025)
MMMM dd, yyyy | strict day (e.g., January 15, 2025)
MMMM d, yyyy | lenient day (e.g., January 5, 2025)
dd.MM.yyyy | strict (e.g., 15.01.2025)
yyyy/MM/dd | strict (e.g., 2025/01/15)
dd-MM-yyyy | strict (e.g., 15-01-2025)
d-M-yyyy | lenient (e.g., 5-1-2025 or 15-01-2025)
 
Note:  y=year, M=month, d=day
Regex Patterns
If you do not want to use Date Format Patterns, select “Use Custom Regex”. This will show the Regex Patterns option, where you can enter your own pattern to match your requirement or pick one from the available list. Press Enter or click Add to apply a custom pattern. Multiple patterns can be added. To remove an added pattern click the icon that is displayed next to the pattern.
 
Select the regex pattern to apply:
MM/dd/yyyy | strict
yyyy-MM-dd | ISO strict
dd/MM/yyyy | strict
Any date-like (flexible)
 
Note:  y=year, M=month, d=day
 
Custom Regular Expression Example
^(0[1-9]|1[0-2])[\/\-](0[1-9]|[12][0-9]|3[01])[\/\-](19|20)\d{2}$
Accepts dates in MM/dd/yyyy or MM-dd-yyyy format, ensuring valid numeric ranges for months (01–12), days (01–31), and years in the 1900–2099 range, but does not verify calendar correctness (for example, February 30 or April 31 may still pass).
Tip...  You can copy/paste Regex Patterns from the Field Data Discovery String Patterns tab.
Invert Test
Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test.
Dimension
(Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension.
 
A dimension represents a characteristic of data quality:
Accuracy - The data is correct.
Completeness - The data is present.
Consistency - The data uses the same format or pattern across different sources.
Timeliness - The data is recent and available.
Uniqueness - The data is not duplicated.
Validity - The data conforms to business rules and is within an acceptable range.
Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics.
For more information about dimensions, see Add New Rule.
For information about managing dimensions, see Managing Data Quality Dimensions.
Weight
(Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension).
For more information, see Add New Rule.
Supported Data Types
String
Remarks
None
 
ValidateEmail
This rule checks whether a field contains a valid email address. You can use regex patterns ranging from strict (RFC 5322) to simple formats. Returns true if the value matches any of the selected patterns.
Note:  Data Profiler uses the International Components for Unicode for Java (ICU4J), an open-source library from the Unicode Consortium which provides internationalization and localization support, including Unicode text processing and locale-aware formatting. For more information, see ICU4J documentation at https://unicode-org.github.io/icu/userguide/icu4j/.
Rule Properties
Rule Name
A default rule name (<FieldName>ValidateEmail) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name.
Note:  The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field.
Field Name
The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, Email (String).
Rule Type
The type of rule that is applied to the field. That is ValidateEmail (Test rule).
Rule Parameters
Email Patterns
Default is RFC 5322 standard (e.g., user@example.com)
 
If you do not wish to use the default, type an email regex pattern or select one from the available list, and then press Enter or click Add to apply it. Multiple patterns can be added. To remove an added pattern click the icon that is displayed next to the pattern.
 
Select the format to apply:
RFC 5322 standard (e.g., user@example.com)
Simple email (basic '@' check)
Corporate only (.com, .org, .net)
No plus addressing (no + in username)
 
Custom Regex Expression Examples
"^[A-Za-z0-9.]+@MyCompany\\.com$"
Only accepts valid emails for the “MyCompany” domain.
 
"^.+@.+\\..+$"
Accepts valid emails for all domains.
Tip...  You can copy/paste Regex Patterns from the Field Data Discovery String Patterns tab.
Invert Test
Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test.
Dimension
(Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension.
A dimension represents a characteristic of data quality:
Accuracy - The data is correct.
Completeness - The data is present.
Consistency - The data uses the same format or pattern across different sources.
Timeliness - The data is recent and available.
Uniqueness - The data is not duplicated.
Validity - The data conforms to business rules and is within an acceptable range.
Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics.
For more information about dimensions, see Add New Rule.
For information about managing dimensions, see Managing Data Quality Dimensions.
Weight
(Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension).
For more information, see Add New Rule.
Supported Data Types
String
Remarks
None
 
ValidatePhoneNumberFormat
This rule checks whether a field contains a valid phone number. Select region-specific formats (such as US Mobile or UK Fixed Line), or define your own using a custom regex pattern. Returns true if the value matches any selected format.
Rule Properties
Rule Name
A default rule name (<FieldName>ValidatePhoneNumber) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name.
Note:  The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field.
Field Name
The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, Phone (String).
Rule Type
The type of rule that is applied to the field. That is ValidatePhoneNumber (Test rule).
Rule Parameters
Use Custom Regex
When enabled, hides the Phone Format Patterns parameter and displays the Regex Patterns option—allowing you to either select Phone Format Patterns or specify the custom Regex Patterns, but not both.
Phone Format Patterns
Default is US:FIXED_LINE_OR_MOBILE
 
If you do not wish to use the default, type a Phone Format Pattern or select one from the available list, and then press Enter or click Add to apply it. Multiple formats can be added. To remove an added format click the icon that is displayed next to the format.
 
Note:  Phone number format pattern allows ‘N, ‘+’, ‘(‘, ‘)’, ‘-’, ‘ ’
Regex Patterns
If you do not want to use Phone Format Patterns, select “Use Custom Regex”. This will show the Regex Patterns option, where you can enter your own pattern to match your requirement or pick one from the available list. Press Enter or click Add to apply it. Multiple patterns can be added. To remove an added pattern click the icon that is displayed next to the pattern.
Note:  Phone number format pattern allows ‘N, ‘+’, ‘(‘, ‘)’, ‘-’, ‘ ’
 
Select the pattern to apply:
US/Canada 10-digit | (555) 123-4567, 555-123-4567, 5551234567
E.164 International | +14155552671, +442071234567
US with optional country code | 1-555-123-4567, (555) 123-4567
UK national and international | +44 20 7123 4567, 020 7123 4567
Generic international (7-15 digits with separators)
Digits only 10-digit | 5551234567
European format | +49 30 1234567, +33 1 23 45 67 89
 
Custom Regex Expression Examples
^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$
Validates US/Canada phone numbers with optional parenthesis and separators.
 
^\+[1-9]\d{1,14}$
Validates International Format (E.164) which starts with ‘+’ followed by up to 15 digits.
Tip...  You can copy/paste Regex Patterns from the Field Data Discovery String Patterns tab.
Invert Test
Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test.
Dimension
(Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension.
A dimension represents a characteristic of data quality:
Accuracy - The data is correct.
Completeness - The data is present.
Consistency - The data uses the same format or pattern across different sources.
Timeliness - The data is recent and available.
Uniqueness - The data is not duplicated.
Validity - The data conforms to business rules and is within an acceptable range.
Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics.
For more information about dimensions, see Add New Rule.
For information about managing dimensions, see Managing Data Quality Dimensions.
Weight
(Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension).
For more information, see Add New Rule.
Supported Data Types
String
Remarks
None
 
ValidatePOBox
This rule checks whether a field contains a PO Box address. Includes multilingual patterns for English, German, Spanish, French, Italian, Portuguese, and Scandinavian formats. Returns true if the value matches any selected pattern.
Note:  Data Profiler uses the International Components for Unicode for Java (ICU4J), an open-source library from the Unicode Consortium which provides internationalization and localization support, including Unicode text processing and locale-aware formatting. For more information, see ICU4J documentation at https://unicode-org.github.io/icu/userguide/icu4j/.
Rule Properties
Rule Name
A default rule name (<FieldName>ValidatePOBox) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name.
Note:  The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field.
Field Name
The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, POB (String).
Rule Type
The type of rule that is applied to the field. That is ValidatePOBox (Test rule).
Rule Parameters
PO Box Patterns
Default is - English (PO Box, P.O. Box, POB, Post Box)
 
If you do not wish to use the default, type a PO Box Pattern or select one from the available list, and then press Enter or click Add to apply it. Multiple patterns can be added. To remove an added pattern click the icon that is displayed next to the pattern. See Custom Regex Expression Example for more information.
 
Select the pattern to apply:
English (PO Box, P.O. Box, POB, Post Box)
German (Postfach, PF)
Spanish (Apartado, Apdo)
French (BP)
French-Swiss (Case Postale, CP)
Norwegian/Danish (Postboks)
Portuguese (Caixa Postal)
Italian (Casella Postale)
Invert Test
Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test.
Dimension
(Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension.
A dimension represents a characteristic of data quality:
Accuracy - The data is correct.
Completeness - The data is present.
Consistency - The data uses the same format or pattern across different sources.
Timeliness - The data is recent and available.
Uniqueness - The data is not duplicated.
Validity - The data conforms to business rules and is within an acceptable range.
Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics.
For more information about dimensions, see Add New Rule.
For information about managing dimensions, see Managing Data Quality Dimensions.
Weight
(Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension).
For more information, see Add New Rule.
Custom Regex Expression Example
The following is a regex expression example which can be entered (in the Regex Pattern field) if you do not wish to use the default regex expression.
Tip...  You can copy/paste Regex Patterns from the Field Data Discovery String Patterns tab.
 
^(P\.?\s?O\.?\s?Box\s?\d+(?:\s?(?:Suite|Unit)\s?\d+)?)$
Validates PO Box mailing addresses, allowing common formatting variations (with or without periods and spaces), a required PO Box number, and an optional Suite or Unit number.
 
Accepts the following:
PO Box 123
P.O. Box 123
P O Box 123
P.O.Box 123
PO Box 123 Suite 4
PO Box 123 Unit 56
 
Enforces the following:
Address must start with PO Box
A numeric box number is required
Optional Suite or Unit with a number
No extra text before or after (entire field must match)
 
Rejects the following:
Street addresses (123 Main St)
Missing box number (PO Box)
Non-numeric box identifiers (PO Box A12)
Additional trailing text (PO Box 123 Attn John)
Supported Data Types
String
Remarks
None
 
ValidatePostalCodeFormat
This rule checks whether a field contains a valid postal/ZIP code. Select country-specific display patterns (e.g., NNNNN for US, ANA NAN for Canada) or use custom regex patterns. Returns true if the value matches any selected pattern.
Note:  Data Profiler uses the International Components for Unicode for Java (ICU4J), an open-source library from the Unicode Consortium which provides internationalization and localization support, including Unicode text processing and locale-aware formatting. For more information, see ICU4J documentation at https://unicode-org.github.io/icu/userguide/icu4j/.
Rule Properties
Rule Name
A default rule name (<FieldName>ValidatePostalCode) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name.
Note:  The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field.
Field Name
The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, Code (String).
Rule Type
The type of rule that is applied to the field. That is ValidatePostalCode (Test rule).
Rule Parameters
Use Custom Regex
When enabled, hides the Postal Format Patterns parameter and displays the Regex Patterns option—allowing you to either select Postal Format Patterns or specify the custom Regex Patterns, but not both.
Postal Format Patterns
Default is - NNNNN
 
If you do not wish to use the default, type a Postal Format Pattern or select one from the available list, and then press Enter or click Add to apply it. Multiple formats can be added. To remove an added format click the icon that is displayed next to the pattern.
Regex Pattern
If you do not want to use Postal Format Patterns, select “Use Custom Regex”. This will show the Regex Patterns option, where you can enter your own pattern to match your requirement or pick one from the available list. Press Enter or click Add to apply it. Multiple expressions can be added. To remove an added expression click the icon that is displayed next to the expression.
 
Select the pattern to apply:
US ZIP 5-digit | 12345, 90210
US ZIP+4 | 12345-6789
US ZIP or ZIP+4 | 12345, 12345-6789
Canada | K1A 0B1, M5V 2T6
UK | SW1A 1AA, EC1A 1BB, W1A 0AX
Europe 4-digit | 1234, 8001 (NL, CH, AT, DK, etc.)
Europe 5-digit | 12345, 75001 (DE, FR, IT, ES, etc.)
Brazil CEP | 01001-000, 12345-678
Japan | 123-4567
Generic alphanumeric (3-10 chars)
 
Custom Regex Expression Examples
^\d{5}(-\d{4})?
Validates US ZIP codes, allowing either the standard 5-digit ZIP or the ZIP+4 format with an optional hyphen and four additional digits.
Accepts: 12345, 12345-6789
Rejects: 1234, 123456, 12345 6789, ABCDE
 
^(GIR\s?0AA|(?:[A-Z]{1,2}\d[A-Z\d]?\s?\d[A-Z]{2}))$
Validates UK postcodes, including the special case GIR 0AA, and allows standard postcode formats with an optional space between the outward and inward components.
Accepts: SW1A 1AA, SW1A1AA, EC1A 1BB, GIR0AA
Rejects: Invalid letter/digit combinations, lowercase values, extra character
Tip...  You can copy/paste Regex Patterns from the Field Data Discovery String Patterns tab.
Invert Test
Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test.
Dimension
(Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension.
A dimension represents a characteristic of data quality:
Accuracy - The data is correct.
Completeness - The data is present.
Consistency - The data uses the same format or pattern across different sources.
Timeliness - The data is recent and available.
Uniqueness - The data is not duplicated.
Validity - The data conforms to business rules and is within an acceptable range.
Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics.
For more information about dimensions, see Add New Rule.
For information about managing dimensions, see Managing Data Quality Dimensions.
Weight
(Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension).
For more information, see Add New Rule.
Supported Data Types
String
Remarks
None
 
ValidateUSStates
This rule checks whether a field contains a valid US State. Accepts full state names (e.g., California, New York), standard two letter abbreviations (e.g., CA, NY), and territories (e.g, PR, GU). The match is case-sensitive. No configuration is required as this rule has no parameters.
With state abbreviations, two uppercase letters are supported. For example, NY and CA. With complete state names, spaces or hyphens are supported. For example, New York and North-Carolina. Special characters and numbers are not supported.
Note:  Data Profiler uses the International Components for Unicode for Java (ICU4J), an open-source library from the Unicode Consortium which provides internationalization and localization support, including Unicode text processing and locale-aware formatting. For more information, see ICU4J documentation at https://unicode-org.github.io/icu/userguide/icu4j/.
Rule Properties
Rule Name
A default rule name (<FieldName>ValidateUSStates) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name.
Note:  The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field.
Field Name
The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, State (String).
Rule Type
The type of rule that is applied to the field. That is ValidateUSStates (Test rule).
 
Rule Parameters
Dimension
(Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension.
A dimension represents a characteristic of data quality:
Accuracy - The data is correct.
Completeness - The data is present.
Consistency - The data uses the same format or pattern across different sources.
Timeliness - The data is recent and available.
Uniqueness - The data is not duplicated.
Validity - The data conforms to business rules and is within an acceptable range.
Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics.
For more information about dimensions, see Add New Rule.
For information about managing dimensions, see Managing Data Quality Dimensions.
Weight
(Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension).
For more information, see Add New Rule.
Supported Data Types
String
Remarks
None
 
ValidateTimeFormat
This rule checks whether a field contains a valid time value. Select from standard time formats (such as 24-hour, 12-hour, or formats with milliseconds), or define your own using a custom regex pattern. Returns true if the value matches any of the selected formats.
Note:  Data Profiler uses the International Components for Unicode for Java (ICU4J), an open-source library from the Unicode Consortium which provides internationalization and localization support, including Unicode text processing and locale-aware formatting. For more information, see ICU4J documentation at https://unicode-org.github.io/icu/userguide/icu4j/.
Rule Properties
Rule Name
A default rule name (<FieldName>ValidateTimeFormat) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name.
Note:  The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field.
Field Name
The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, Time (String).
Rule Type
The type of rule that is applied to the field. That is ValidateTimeFormat (Test rule).
Rule Parameters
Use Custom Regex
When enabled, hides the Time Format Patterns parameter and displays the Regex Patterns option—allowing you to either select Time Format Patterns or specify the custom Regex Patterns, but not both.
Time Format Patterns
A default value may be displayed. This default value is coming from the Data Profile preference settings and can be customized (see Setting Data Profile Preferences).
 
If you do not wish to use the default, type a Time Format Pattern or select one from the available list and then press Enter or click Add to apply it. Multiple formats can be added. To remove an added format click the icon that is displayed next to the format.
 
Select a format to apply.
HH:mm:ss | 24-hour strict (e.g., 14:30:45) - Validates time in 24-hour format with mandatory two-digit hours, minutes, and seconds.
H:mm:ss | 24-hour lenient (e.g., 9:30:45 or 14:30:45) - Validates time in 24-hour format allowing single- or double-digit hours, with minutes and seconds.
hh:mm:ss a | 12-hour strict (e.g., 02:30:45 PM) - Validates time in 12-hour format with mandatory two-digit hours, minutes, seconds, and AM/PM suffix.
h:mm:ss a | 12-hour lenient (e.g., 2:30:45 PM or 02:30:45 PM) - Validates time in 12-hour format allowing single- or double-digit hours, with minutes, seconds, and AM/PM suffix.
HH:mm | 24-hour short strict (e.g., 14:30) - Validates time in 24-hour format with mandatory two-digit hours and minutes (no seconds).
H:mm | 24-hour short lenient (e.g., 9:30 or 14:30) - Validates time in 24-hour format allowing single- or double-digit hours, with minutes only.
hh:mm a | 12-hour short strict (e.g., 02:30 PM) - Validates time in 12-hour format with mandatory two-digit hours and minutes, plus AM/PM.
h:mm a | 12-hour short lenient (e.g., 2:30 PM or 02:30 PM) - Validates time in 12-hour format allowing single- or double-digit hours, with minutes and AM/PM.
HH:mm:ss.SSS | with millis strict (e.g., 14:30:45.123) - Validates time in 24-hour format with mandatory two-digit hours, minutes, seconds, and milliseconds.
hh:mm:ss.SSS a | 12-hour millis strict (e.g., 02:30:45.123 PM) - Validates time in 12-hour format with mandatory two-digit hours, minutes, seconds, milliseconds, and AM/PM.
h:mm:ss.SSS a | 12-hour millis lenient (e.g., 9:30:45.123 AM) - Validates time in 12-hour format allowing single- or double-digit hours, with minutes, seconds, milliseconds, and AM/PM.
HH.mm.ss | dot-separated strict (e.g., 14.30.45) - Validates time in 24-hour format with mandatory two-digit hours, minutes, and seconds separated by dots.
 
Note:  H = hour (0-23), h = hour (1-12) m = minute, s = second, S = millisecond, a = AM/PM
Regex Patterns
If you do not want to use Time Format Patterns, select “Use Custom Regex”. This will show the Regex Patterns option, where you can enter your own pattern to match your requirement or pick one from the available list. Press Enter or click Add to apply it. Multiple expressions can be added. To remove an added expression click the icon that is displayed next to the expression.
 
Select a pattern to apply:
HH:mm:ss | 24-hr strict - Validates strict 24-hour time with mandatory hours, minutes, and seconds (00–23:00–59:00–59).
HH:mm | 24hr - Validates 24-hour time format with hours and minutes only (00–23:00–59).
12-hr with AM/PM - Validates 12-hour time with minutes, optional seconds, and a case-insensitive AM/PM indicator.
Any time-like (flexible) - Matches loosely formatted time values with : or . separators, optional seconds, and optional AM/PM.
 
Tip...  You can copy/paste Regex Patterns from the Field Data Discovery String Patterns tab.
Invert Test
Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test.
Dimension
(Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension.
A dimension represents a characteristic of data quality:
Accuracy - The data is correct.
Completeness - The data is present.
Consistency - The data uses the same format or pattern across different sources.
Timeliness - The data is recent and available.
Uniqueness - The data is not duplicated.
Validity - The data conforms to business rules and is within an acceptable range.
Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics.
For more information about dimensions, see Add New Rule.
For information about managing dimensions, see Managing Data Quality Dimensions.
Weight
(Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension).
For more information, see Add New Rule.
Supported Data Types
String
Remarks
None
Last modified date: 04/17/2026