Rule Name | Description |
|---|---|
Provides users the ability to evaluate multiple fields and conditions within a single rule and it ensures that a specific relationship or calculation between fields is met. | |
Evaluates a field by comparing its value to a specified constant using the selected operator, and returns true or false based on the result. | |
Compares the value of one field with another field using a specified operator, and returns true or false based on the result. | |
Checks whether a field contains a value and returns true if a value is present. | |
Checks whether the field value is unique within the dataset and returns true if no duplicate value exists, otherwise false. | |
Checks whether a field value is not null and returns true or false based on the result. | |
Checks whether a field value falls within a specified range. | |
Checks whether a field matches a regex expression. Returns true if the value matches the specified regex expression. | |
Checks whether a field contains a valid currency value. Returns true if the value matches any of the selected/specified identifiers/patterns. | |
Checks whether a field contains a valid date value. Returns true if the value matches any of the selected/specified formats/patterns. | |
Checks whether a field contains a valid email address. Returns true if the value matches any of the selected/specified formats/patterns. | |
Checks whether a field contains a valid phone number. Returns true if the value matches any of the selected/specified formats/patterns. | |
Checks whether a field contains a PO Box address. Returns true if the value matches any of the selected/specified formats/patterns. | |
Checks whether a field contains a valid postal/ZIP code. Returns true if the value matches any of the selected/specified formats/patterns. | |
Checks whether a field contains a valid US State. | |
Checks whether a field contains a valid time value. Returns true if the value matches any of the selected/specified formats/patterns. |
Rule Name | A default rule name (Assert or Assert_n, where “n” is 1,2,3, and so on) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | This is shown as empty as this rule can be used to evaluate multiple fields. |
Rule Type | The type of rule that is applied to the field. That is Assert (Test rule). Tip... A Red Cross in a rule icon (for example |
Invert Test | Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test. |
Script (Left Expression) | The left expression is always a script and can be one or more valid expressions supported by the ExecuteExpression rule. Click Build to open the Expression Builder dialog to assist you in building the SQL script. For example, to add multiple regular expressions use the RLIKE operator: CASE WHEN field RLIKE 'pattern1' THEN true WHEN field RLIKE 'pattern2' THEN true WHEN field RLIKE 'pattern3' THEN true ELSE false END |
Operator | The operator that compares the left side (actual value) with the right side (expected value). Select from one of the following comparison operators: • Equal - Checks if the left side value is equal to the right side value. • Greater - Checks if the left side value is greater than the right side value. • Greater or Equal - Checks if the left side value is greater than or equal to the right side value. • Lesser - Checks if the left side value is lesser than the right side value. • Lesser or Equal - Checks if the left side value is lesser than or equal to the right side value. • Not Equal - Checks if the left side value is not equal to the right side value. • StartsWith - Checks if the left side value starts with the right side value. • EndsWith - Checks whether the left side value ends with the right side value. • Matches - Checks whether the left side value matches the right side value (uses regex match). |
Expression Type (Right Expression) | This drop-down displays options for the type of right hand side expression. The available choices depend on the selection made in the Operator field. The following options are available: • Expression - Compare the left expression against a right expression. The right expression can be specified in the Script box. This option is available for the Equal, Greater, Greater or Equal, Lesser, Lesser or Equal, Not Equal, StartsWith, and EndsWith operators. • Constant - Compare the left expression against a list of constants. You can type one or more constant values to compare and then press Enter or click Add to apply it. This option is available for the Equal, Greater, Greater or Equal, Lesser, Lesser or Equal, Not Equal, StartsWith, and EndsWith operators. • Field - Compare the left expression against a specific field. You can specify only one field using the Compare Field drop-down. This option is available for the Equal, Greater, Greater or Equal, Lesser, Lesser or Equal, Not Equal, StartsWith, and EndsWith operators. • Lookup - Compare the left side value with a looked up value. You can specify the Lookup value using the Lookup widget (see LookupValue rule). This option is available for the Equal, Greater, Greater or Equal, Lesser, Lesser or Equal, Not Equal, StartsWith, and EndsWith operators. • NotBlank - Check if the left side value is not blank. This option is available only when the Operator is specified as Equal. • NotNull - Check if the left expression value is not null. This option is available only when the Operator is specified as Equal. |
Script (Right Expression) | The right expression in some cases is a script and can be any valid expression supported by the ExecuteExpression rule. Click Build to open the Expression Builder dialog which helps you to construct SQL scripts. This dialog provides a list of available fields, operators, and functions you can use to build expressions. See Using the Expression Builder. |
Regular Expression (Right Expression) | The right expression when the Operator (that compares the left side of the expression with the right side) is specified as Matches. In this case the Expression Type is set to Regular Expression and cannot be changed. You can specify a regular expression (Regex) pattern here. See Java Regular Expressions. |
Compare Field (Right Expression) | Used to specify a field for comparison, when the Expression Type is Field. This drop-down is displayed only when the Expression Type is Field. |
Constant box (Right Expression) | Used to specify a list of constants for comparison, when the Expression Type is Constant. This text box is displayed only when the Expression Type is Constant. You can type one or more constant values to compare and then press Enter or click Add to apply it. |
Lookup widget (Right Expression) | Used to specify the Lookup value for comparison, when the Expression Type is Lookup. The Lookup Widget is displayed only when the Expression Type is Lookup. To learn how to use the lookup widget, see LookupValue rule. |
Dimension | (Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension. A dimension represents a characteristic of data quality: • Accuracy - The data is correct. • Completeness - The data is present. • Consistency - The data uses the same format or pattern across different sources. • Timeliness - The data is recent and available. • Uniqueness - The data is not duplicated. • Validity - The data conforms to business rules and is within an acceptable range. Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics. For more information about dimensions, see Add New Rule. For information about managing dimensions, see Managing Data Quality Dimensions. |
Weight | (Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension). For more information, see Add New Rule. |
Rule Name | A default rule name (<FieldName>_CompareToConstant) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, myField (String). |
Rule Type | The type of rule that is applied to the field. That is CompareToConstant (Test rule). Tip... A Red Cross in a rule icon (for example |
Invert Test | Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test. |
Operator | Select an operator that determines how the field value is compared against the constant (e.g., Equal, Greater, Not Equal). Select from one of the following comparison operators: • Equal • Greater • Greater or Equal • Lesser • Lesser or Equal • Not Equal |
Constant | Type the constant value that is used as the reference in the comparison against the field value, and then press Enter or click Add to apply it. For equal or not equal you can specify one or more constants. |
Dimension | (Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension. A dimension represents a characteristic of data quality: • Accuracy - The data is correct. • Completeness - The data is present. • Consistency - The data uses the same format or pattern across different sources. • Timeliness - The data is recent and available. • Uniqueness - The data is not duplicated. • Validity - The data conforms to business rules and is within an acceptable range. Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics. For more information about dimensions, see Add New Rule. For information about managing dimensions, see Managing Data Quality Dimensions. |
Weight | (Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension). For more information, see Add New Rule. |
Rule Name | A default rule name (<FieldName>_CompareToField) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, myField (String). |
Rule Type | The type of rule that is applied to the field. That is CompareToField (Test rule). |
Compare Field | The second field whose value is used as the reference for comparison against the primary field (the field to which the rule applies). |
Use Fuzzy Match | Enable to configure Fuzzy Matching which returns true if two strings match based on the specified Fuzzy Match Algorithm and match score filter. Note: This option supports String data types. |
Operator | Select from one of the following comparison operators: • Equal • Greater • Greater or Equal • Lesser • Lesser or Equal • Not Equal |
Fuzzy Match Algorithm | • CONTAINS: Checks the small string in the long string. • DAMERAU–LEVENSHTEIN: This distance between two strings is the minimum number of operations (insertions, deletions or substitutions of a single character, or transposition of two adjacent characters) required to change one string into another. For more information, see Damerau–Levenshtein distance. • EXACT MATCH: Checks two strings for exact match. • JARO: A measure of characters in common, being no more than half the length of the longer string in distance with consideration for transpositions. • JARO-WINKLER: This algorithm extends Jaro algorithm and uses a prefix scale “p” which gives more favorable ratings to strings that match from the beginning for a set prefix length “l”. The prefix length is set to 4 characters (the maximum possible prefix length). For more information, see Jaro–Winkler distance. • LEVENSHTEIN: This distance between two strings is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one string into other. For more information, see Levenshtein distance. • POSITIONAL QGRAM: Positional q-grams is an extension to q-grams with an additional property called maxDistance used in calculating the string similarity. For example, ‘peter’ contains the positional bigrams (‘pe’,0), (‘et’,1), (‘te’,2) and (‘er’,3). If maxDistance is set to 1, then (‘et’,1) will only be matched to bi-grams in the other string with positions 0 to 2 • QGRAM: q-grams (also called as n-grams) is the process of breaking up a string into multiple strings each with length n (q). For example, apply q-grams on abcde with q = 3 would yield abc, bcd, cde. For example, the word ‘peter’ with q=2 would yield following bigrams - ‘pe’, ‘et’, ‘te’ and ‘er’. A q-gram similarity measure between two strings is calculated by counting the number of q-grams in common. • SHORTHAND: Determine if the shorter string is a “shorthand” of the longer. That is, if it can be produced only by deletions. If the first character of a word in the longer string is deleted, the entire word is considered deleted. For e.g. IBM is a shorthand of International Business Machines. |
Fuzzy Score Filter | Add a decimal values between 0.01 to 1. Default value is 0.7. Comparison score is between 0 to 1. Records with comparison score less than this value are not considered a match for probable duplicates and are discarded. The higher the value you select, more strict is the matching. |
Dimension | (Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension. A dimension represents a characteristic of data quality: • Accuracy - The data is correct. • Completeness - The data is present. • Consistency - The data uses the same format or pattern across different sources. • Timeliness - The data is recent and available. • Uniqueness - The data is not duplicated. • Validity - The data conforms to business rules and is within an acceptable range. Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics. For more information about dimensions, see Add New Rule. For information about managing dimensions, see Managing Data Quality Dimensions. |
Weight | (Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension). For more information, see Add New Rule. |
Invert Test | Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test. |
Rule Name | A default rule name (<FieldName>_IsNotBlank) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, myField (String). |
Rule Type | The type of rule that is applied to the field. That is IsNotBlank (Test rule). |
Invert Test | Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test. |
Dimension | (Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension. A dimension represents a characteristic of data quality: • Accuracy - The data is correct. • Completeness - The data is present. • Consistency - The data uses the same format or pattern across different sources. • Timeliness - The data is recent and available. • Uniqueness - The data is not duplicated. • Validity - The data conforms to business rules and is within an acceptable range. Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics. For more information about dimensions, see Add New Rule. For information about managing dimensions, see Managing Data Quality Dimensions. |
Weight | (Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension). For more information, see Add New Rule. |
Rule Name | A default rule name (<FieldName>_IsNotDuplicate) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, myField (String). |
Rule Type | The type of rule that is applied to the field. That is IsNotDuplicate (Test rule). |
Dimension | (Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension. A dimension represents a characteristic of data quality: • Accuracy - The data is correct. • Completeness - The data is present. • Consistency - The data uses the same format or pattern across different sources. • Timeliness - The data is recent and available. • Uniqueness - The data is not duplicated. • Validity - The data conforms to business rules and is within an acceptable range. Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics. For more information about dimensions, see Add New Rule. For information about managing dimensions, see Managing Data Quality Dimensions. |
Weight | (Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension). For more information, see Add New Rule. |
Rule Name | A default rule name (<FieldName>_IsNotNull) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, myField (String). |
Rule Type | The type of rule that is applied to the field. That is IsNotNull (Test rule). |
Invert Test | Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test. |
Dimension | (Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension. A dimension represents a characteristic of data quality: • Accuracy - The data is correct. • Completeness - The data is present. • Consistency - The data uses the same format or pattern across different sources. • Timeliness - The data is recent and available. • Uniqueness - The data is not duplicated. • Validity - The data conforms to business rules and is within an acceptable range. Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics. For more information about dimensions, see Add New Rule. For information about managing dimensions, see Managing Data Quality Dimensions. |
Weight | (Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension). For more information, see Add New Rule. |
Rule Name | A default rule name (<FieldName>_InRange) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, Count (Integer). |
Rule Type | The type of rule that is applied to the field. That is InRange (Test rule). Tip... A Red Cross in a rule icon (for example |
Lower Bound | The minimum value of the range. Select Inclusive to include the lower bound number in the range. |
Upper Bound | The maximum value of the range. Select Inclusive to include the upper bound number in the range. |
Dimension | (Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension. A dimension represents a characteristic of data quality: • Accuracy - The data is correct. • Completeness - The data is present. • Consistency - The data uses the same format or pattern across different sources. • Timeliness - The data is recent and available. • Uniqueness - The data is not duplicated. • Validity - The data conforms to business rules and is within an acceptable range. Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics. For more information about dimensions, see Add New Rule. For information about managing dimensions, see Managing Data Quality Dimensions. |
Weight | (Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension). For more information, see Add New Rule. |
Rule Name | A default rule name (<FieldName>_MatchesRegex, <FieldName>_MatchesRegex_n, where “n” is 1,2,3, and so on) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, myField (String). |
Rule Type | The type of rule that is applied to the field. That is MatchesRegex (Test rule). |
Invert Test | Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test. |
Regular Expression | Specify one or more regular expressions by typing or selecting them from the drop-down list and then clicking Add. Click You can also: • Copy/paste a regular expression or pattern detected by data discovery. See Field Data Discovery (also see steps, below). • Paste a macro by right-clicking in the text box and selecting the macro from the Paste Macro dialog. See Pasting Macro in Map, Profile or Process. • Create a macro by right-clicking in the text box. See Creating a Macro by Selecting a Value. To copy/paste values detected by data discovery: 1. In the Field/Rule pane, select a field to open the Field Data Discovery pane. 2. In the Field Data Discovery pane: • Click • Select the String Patterns tab. • Right-click on the cell that contains the desired regular expression or pattern. • Do one of the following: – Select Copy Regex Pattern to copy the regular expression. – Select Copy Input Value to copy the input value. 3. In the Field/Rule pane, select the rule to open the Rule Definition pane. 4. Paste into the Regular Expression box and click Add. 5. To add another regular expression or pattern provided by data discovery, select the field again. |
Dimension | (Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension. A dimension represents a characteristic of data quality: • Accuracy - The data is correct. • Completeness - The data is present. • Consistency - The data uses the same format or pattern across different sources. • Timeliness - The data is recent and available. • Uniqueness - The data is not duplicated. • Validity - The data conforms to business rules and is within an acceptable range. Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics. For more information about dimensions, see Add New Rule. For information about managing dimensions, see Managing Data Quality Dimensions. |
Weight | (Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension). For more information, see Add New Rule. |
Rule Name | A default rule name (<FieldName>ValidateCurrency) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, Currency (String). |
Rule Type | The type of rule that is applied to the field. That is ValidateCurrency (Test rule). |
Use Custom Regex | When enabled, hides the Currency Identifiers parameter and displays the Regex Patterns option—allowing you to either select Currency Identifiers or specify the custom Regex Patterns, but not both. |
Currency Identifiers | Allows to specify one or more Currency Identifiers that your source data uses. Values matching any selected identifier will be converted. This option is hidden when Use Custom Regex is selected. A default value may be displayed. This default value is coming from the Data Profile preference settings and can be customized (see Setting Data Profile Preferences). If you do not wish to use the default, type a Currency Identifier or select one from the available list, and then press Enter or click Add to apply it. Multiple identifiers can be added. To remove an added identifier click the Note: A currency identifier is a standardized code, symbol used to recognize, classify, and verify money, commonly known as ISO 4217 codes (e.g., USD, EUR, INR) for international exchange. |
Regex Patterns | If you do not want to use Currency Identifiers, select “Use Custom Regex”. This will show the Regex Patterns option, where you can enter your own pattern to match your requirement or pick one from the available list. Press Enter or click Add to apply it. Multiple patterns can be added. To remove an added pattern click the Select the pattern to apply: • USD dollar sign | $1,234.56, $100, $0.99 • EUR euro sign | €1.234,56, €100, €0,99 • GBP pound sign | £1,234.56, £100, £0.99 • JPY/CNY yen/yuan sign | ¥1,234, ¥ 100 • INR rupee sign | 1,23,456.78, ₹100 • ISO code prefix | USD 1,234.56, EUR 100 • Any major currency symbol prefix | $100, €50, £75 • Flexible with negatives | -$100, $1,234.56, ($50.00) Custom Regular Expression Examples ^\$?\d{1,3}(,\d{3})*(\.\d{2})?$ Accepts a numeric monetary value with an optional leading dollar sign ($), an integer part that may include comma thousands separators, and an optional fractional part consisting of exactly two decimal places, using a period as the decimal separator. ^(USD|EUR|GBP)\s?\d+(\.\d{2})?$ Accepts a currency code (USD, EUR, or GBP), followed by an optional single space, then a numeric amount consisting of one or more digits, with an optional fractional part of exactly two decimal places (preceded by a decimal point). Tip... You can copy/paste Regex Patterns from the Field Data Discovery String Patterns tab. |
Invert Test | Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test. |
Dimension | (Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension. A dimension represents a characteristic of data quality: • Accuracy - The data is correct. • Completeness - The data is present. • Consistency - The data uses the same format or pattern across different sources. • Timeliness - The data is recent and available. • Uniqueness - The data is not duplicated. • Validity - The data conforms to business rules and is within an acceptable range. Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics. For more information about dimensions, see Add New Rule. For information about managing dimensions, see Managing Data Quality Dimensions. |
Weight | (Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension). For more information, see Add New Rule. |
Rule Name | A default rule name (<FieldName>ValidateDateFormat) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, Date (String). |
Rule Type | The type of rule that is applied to the field. That is ValidateDateFormat (Test rule). |
Use Custom Regex | When enabled, hides the Date Format Patterns parameter and displays the Regex Patterns option—allowing you to either select Date Format Patterns or specify the custom Regex Patterns, but not both. |
Date Format Patterns | A default value may be displayed. This default value is coming from the Data Profile preference settings and can be customized (see Setting Data Profile Preferences). If you do not wish to use the default, type a Date Format Pattern or select one from the available list, and then press Enter or click Add to apply it. Multiple formats can be added. To remove an added format click the Select a format to apply: • MM/dd/yyyy | strict (e.g., 01/15/2025) • M/d/yyyy | lenient (e.g., 1/5/2025 or 01/15/2025) • dd/MM/yyyy | strict (e.g., 15/01/2025) • d/M/yyyy | lenient (e.g., 5/1/2025 or 15/01/2025) • yyyy-MM-dd | ISO strict (e.g., 2025-01-15) • dd-MMM-yyyy | strict day (e.g., 15-Jan-2025) • d-MMM-yyyy | lenient day (e.g., 5-Jan-2025 or 15-Jan-2025) • MMM dd, yyyy | strict day (e.g., Jan 15, 2025) • MMM d, yyyy | lenient day (e.g., Jan 5, 2025 or Jan 15, 2025) • MMMM dd, yyyy | strict day (e.g., January 15, 2025) • MMMM d, yyyy | lenient day (e.g., January 5, 2025) • dd.MM.yyyy | strict (e.g., 15.01.2025) • yyyy/MM/dd | strict (e.g., 2025/01/15) • dd-MM-yyyy | strict (e.g., 15-01-2025) • d-M-yyyy | lenient (e.g., 5-1-2025 or 15-01-2025) Note: y=year, M=month, d=day |
Regex Patterns | If you do not want to use Date Format Patterns, select “Use Custom Regex”. This will show the Regex Patterns option, where you can enter your own pattern to match your requirement or pick one from the available list. Press Enter or click Add to apply a custom pattern. Multiple patterns can be added. To remove an added pattern click the Select the regex pattern to apply: • MM/dd/yyyy | strict • yyyy-MM-dd | ISO strict • dd/MM/yyyy | strict • Any date-like (flexible) Note: y=year, M=month, d=day Custom Regular Expression Example ^(0[1-9]|1[0-2])[\/\-](0[1-9]|[12][0-9]|3[01])[\/\-](19|20)\d{2}$ Accepts dates in MM/dd/yyyy or MM-dd-yyyy format, ensuring valid numeric ranges for months (01–12), days (01–31), and years in the 1900–2099 range, but does not verify calendar correctness (for example, February 30 or April 31 may still pass). Tip... You can copy/paste Regex Patterns from the Field Data Discovery String Patterns tab. |
Invert Test | Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test. |
Dimension | (Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension. A dimension represents a characteristic of data quality: • Accuracy - The data is correct. • Completeness - The data is present. • Consistency - The data uses the same format or pattern across different sources. • Timeliness - The data is recent and available. • Uniqueness - The data is not duplicated. • Validity - The data conforms to business rules and is within an acceptable range. Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics. For more information about dimensions, see Add New Rule. For information about managing dimensions, see Managing Data Quality Dimensions. |
Weight | (Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension). For more information, see Add New Rule. |
Rule Name | A default rule name (<FieldName>ValidateEmail) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, Email (String). |
Rule Type | The type of rule that is applied to the field. That is ValidateEmail (Test rule). |
Email Patterns | Default is RFC 5322 standard (e.g., user@example.com) If you do not wish to use the default, type an email regex pattern or select one from the available list, and then press Enter or click Add to apply it. Multiple patterns can be added. To remove an added pattern click the Select the format to apply: • RFC 5322 standard (e.g., user@example.com) • Simple email (basic '@' check) • Corporate only (.com, .org, .net) • No plus addressing (no + in username) Custom Regex Expression Examples "^[A-Za-z0-9.]+@MyCompany\\.com$" Only accepts valid emails for the “MyCompany” domain. "^.+@.+\\..+$" Accepts valid emails for all domains. Tip... You can copy/paste Regex Patterns from the Field Data Discovery String Patterns tab. |
Invert Test | Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test. |
Dimension | (Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension. A dimension represents a characteristic of data quality: • Accuracy - The data is correct. • Completeness - The data is present. • Consistency - The data uses the same format or pattern across different sources. • Timeliness - The data is recent and available. • Uniqueness - The data is not duplicated. • Validity - The data conforms to business rules and is within an acceptable range. Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics. For more information about dimensions, see Add New Rule. For information about managing dimensions, see Managing Data Quality Dimensions. |
Weight | (Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension). For more information, see Add New Rule. |
Rule Name | A default rule name (<FieldName>ValidatePhoneNumber) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, Phone (String). |
Rule Type | The type of rule that is applied to the field. That is ValidatePhoneNumber (Test rule). |
Use Custom Regex | When enabled, hides the Phone Format Patterns parameter and displays the Regex Patterns option—allowing you to either select Phone Format Patterns or specify the custom Regex Patterns, but not both. |
Phone Format Patterns | Default is US:FIXED_LINE_OR_MOBILE If you do not wish to use the default, type a Phone Format Pattern or select one from the available list, and then press Enter or click Add to apply it. Multiple formats can be added. To remove an added format click the Note: Phone number format pattern allows ‘N, ‘+’, ‘(‘, ‘)’, ‘-’, ‘ ’ |
Regex Patterns | If you do not want to use Phone Format Patterns, select “Use Custom Regex”. This will show the Regex Patterns option, where you can enter your own pattern to match your requirement or pick one from the available list. Press Enter or click Add to apply it. Multiple patterns can be added. To remove an added pattern click the Note: Phone number format pattern allows ‘N, ‘+’, ‘(‘, ‘)’, ‘-’, ‘ ’ Select the pattern to apply: • US/Canada 10-digit | (555) 123-4567, 555-123-4567, 5551234567 • E.164 International | +14155552671, +442071234567 • US with optional country code | 1-555-123-4567, (555) 123-4567 • UK national and international | +44 20 7123 4567, 020 7123 4567 • Generic international (7-15 digits with separators) • Digits only 10-digit | 5551234567 • European format | +49 30 1234567, +33 1 23 45 67 89 Custom Regex Expression Examples ^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$ Validates US/Canada phone numbers with optional parenthesis and separators. ^\+[1-9]\d{1,14}$ Validates International Format (E.164) which starts with ‘+’ followed by up to 15 digits. Tip... You can copy/paste Regex Patterns from the Field Data Discovery String Patterns tab. |
Invert Test | Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test. |
Dimension | (Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension. A dimension represents a characteristic of data quality: • Accuracy - The data is correct. • Completeness - The data is present. • Consistency - The data uses the same format or pattern across different sources. • Timeliness - The data is recent and available. • Uniqueness - The data is not duplicated. • Validity - The data conforms to business rules and is within an acceptable range. Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics. For more information about dimensions, see Add New Rule. For information about managing dimensions, see Managing Data Quality Dimensions. |
Weight | (Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension). For more information, see Add New Rule. |
Rule Name | A default rule name (<FieldName>ValidatePOBox) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, POB (String). |
Rule Type | The type of rule that is applied to the field. That is ValidatePOBox (Test rule). |
PO Box Patterns | Default is - English (PO Box, P.O. Box, POB, Post Box) If you do not wish to use the default, type a PO Box Pattern or select one from the available list, and then press Enter or click Add to apply it. Multiple patterns can be added. To remove an added pattern click the Select the pattern to apply: • English (PO Box, P.O. Box, POB, Post Box) • German (Postfach, PF) • Spanish (Apartado, Apdo) • French (BP) • French-Swiss (Case Postale, CP) • Norwegian/Danish (Postboks) • Portuguese (Caixa Postal) • Italian (Casella Postale) |
Invert Test | Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test. |
Dimension | (Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension. A dimension represents a characteristic of data quality: • Accuracy - The data is correct. • Completeness - The data is present. • Consistency - The data uses the same format or pattern across different sources. • Timeliness - The data is recent and available. • Uniqueness - The data is not duplicated. • Validity - The data conforms to business rules and is within an acceptable range. Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics. For more information about dimensions, see Add New Rule. For information about managing dimensions, see Managing Data Quality Dimensions. |
Weight | (Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension). For more information, see Add New Rule. |
Rule Name | A default rule name (<FieldName>ValidatePostalCode) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, Code (String). |
Rule Type | The type of rule that is applied to the field. That is ValidatePostalCode (Test rule). |
Use Custom Regex | When enabled, hides the Postal Format Patterns parameter and displays the Regex Patterns option—allowing you to either select Postal Format Patterns or specify the custom Regex Patterns, but not both. |
Postal Format Patterns | Default is - NNNNN If you do not wish to use the default, type a Postal Format Pattern or select one from the available list, and then press Enter or click Add to apply it. Multiple formats can be added. To remove an added format click the |
Regex Pattern | If you do not want to use Postal Format Patterns, select “Use Custom Regex”. This will show the Regex Patterns option, where you can enter your own pattern to match your requirement or pick one from the available list. Press Enter or click Add to apply it. Multiple expressions can be added. To remove an added expression click the Select the pattern to apply: • US ZIP 5-digit | 12345, 90210 • US ZIP+4 | 12345-6789 • US ZIP or ZIP+4 | 12345, 12345-6789 • Canada | K1A 0B1, M5V 2T6 • UK | SW1A 1AA, EC1A 1BB, W1A 0AX • Europe 4-digit | 1234, 8001 (NL, CH, AT, DK, etc.) • Europe 5-digit | 12345, 75001 (DE, FR, IT, ES, etc.) • Brazil CEP | 01001-000, 12345-678 • Japan | 123-4567 • Generic alphanumeric (3-10 chars) Custom Regex Expression Examples ^\d{5}(-\d{4})? Validates US ZIP codes, allowing either the standard 5-digit ZIP or the ZIP+4 format with an optional hyphen and four additional digits. Accepts: 12345, 12345-6789 Rejects: 1234, 123456, 12345 6789, ABCDE ^(GIR\s?0AA|(?:[A-Z]{1,2}\d[A-Z\d]?\s?\d[A-Z]{2}))$ Validates UK postcodes, including the special case GIR 0AA, and allows standard postcode formats with an optional space between the outward and inward components. Accepts: SW1A 1AA, SW1A1AA, EC1A 1BB, GIR0AA Rejects: Invalid letter/digit combinations, lowercase values, extra character Tip... You can copy/paste Regex Patterns from the Field Data Discovery String Patterns tab. |
Invert Test | Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test. |
Dimension | (Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension. A dimension represents a characteristic of data quality: • Accuracy - The data is correct. • Completeness - The data is present. • Consistency - The data uses the same format or pattern across different sources. • Timeliness - The data is recent and available. • Uniqueness - The data is not duplicated. • Validity - The data conforms to business rules and is within an acceptable range. Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics. For more information about dimensions, see Add New Rule. For information about managing dimensions, see Managing Data Quality Dimensions. |
Weight | (Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension). For more information, see Add New Rule. |
Rule Name | A default rule name (<FieldName>ValidateUSStates) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, State (String). |
Rule Type | The type of rule that is applied to the field. That is ValidateUSStates (Test rule). |
Dimension | (Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension. A dimension represents a characteristic of data quality: • Accuracy - The data is correct. • Completeness - The data is present. • Consistency - The data uses the same format or pattern across different sources. • Timeliness - The data is recent and available. • Uniqueness - The data is not duplicated. • Validity - The data conforms to business rules and is within an acceptable range. Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics. For more information about dimensions, see Add New Rule. For information about managing dimensions, see Managing Data Quality Dimensions. |
Weight | (Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension). For more information, see Add New Rule. |
Rule Name | A default rule name (<FieldName>ValidateTimeFormat) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, Time (String). |
Rule Type | The type of rule that is applied to the field. That is ValidateTimeFormat (Test rule). |
Use Custom Regex | When enabled, hides the Time Format Patterns parameter and displays the Regex Patterns option—allowing you to either select Time Format Patterns or specify the custom Regex Patterns, but not both. |
Time Format Patterns | A default value may be displayed. This default value is coming from the Data Profile preference settings and can be customized (see Setting Data Profile Preferences). If you do not wish to use the default, type a Time Format Pattern or select one from the available list and then press Enter or click Add to apply it. Multiple formats can be added. To remove an added format click the Select a format to apply. • HH:mm:ss | 24-hour strict (e.g., 14:30:45) - Validates time in 24-hour format with mandatory two-digit hours, minutes, and seconds. • H:mm:ss | 24-hour lenient (e.g., 9:30:45 or 14:30:45) - Validates time in 24-hour format allowing single- or double-digit hours, with minutes and seconds. • hh:mm:ss a | 12-hour strict (e.g., 02:30:45 PM) - Validates time in 12-hour format with mandatory two-digit hours, minutes, seconds, and AM/PM suffix. • h:mm:ss a | 12-hour lenient (e.g., 2:30:45 PM or 02:30:45 PM) - Validates time in 12-hour format allowing single- or double-digit hours, with minutes, seconds, and AM/PM suffix. • HH:mm | 24-hour short strict (e.g., 14:30) - Validates time in 24-hour format with mandatory two-digit hours and minutes (no seconds). • H:mm | 24-hour short lenient (e.g., 9:30 or 14:30) - Validates time in 24-hour format allowing single- or double-digit hours, with minutes only. • hh:mm a | 12-hour short strict (e.g., 02:30 PM) - Validates time in 12-hour format with mandatory two-digit hours and minutes, plus AM/PM. • h:mm a | 12-hour short lenient (e.g., 2:30 PM or 02:30 PM) - Validates time in 12-hour format allowing single- or double-digit hours, with minutes and AM/PM. • HH:mm:ss.SSS | with millis strict (e.g., 14:30:45.123) - Validates time in 24-hour format with mandatory two-digit hours, minutes, seconds, and milliseconds. • hh:mm:ss.SSS a | 12-hour millis strict (e.g., 02:30:45.123 PM) - Validates time in 12-hour format with mandatory two-digit hours, minutes, seconds, milliseconds, and AM/PM. • h:mm:ss.SSS a | 12-hour millis lenient (e.g., 9:30:45.123 AM) - Validates time in 12-hour format allowing single- or double-digit hours, with minutes, seconds, milliseconds, and AM/PM. • HH.mm.ss | dot-separated strict (e.g., 14.30.45) - Validates time in 24-hour format with mandatory two-digit hours, minutes, and seconds separated by dots. Note: H = hour (0-23), h = hour (1-12) m = minute, s = second, S = millisecond, a = AM/PM |
Regex Patterns | If you do not want to use Time Format Patterns, select “Use Custom Regex”. This will show the Regex Patterns option, where you can enter your own pattern to match your requirement or pick one from the available list. Press Enter or click Add to apply it. Multiple expressions can be added. To remove an added expression click the Select a pattern to apply: • HH:mm:ss | 24-hr strict - Validates strict 24-hour time with mandatory hours, minutes, and seconds (00–23:00–59:00–59). • HH:mm | 24hr - Validates 24-hour time format with hours and minutes only (00–23:00–59). • 12-hr with AM/PM - Validates 12-hour time with minutes, optional seconds, and a case-insensitive AM/PM indicator. • Any time-like (flexible) - Matches loosely formatted time values with : or . separators, optional seconds, and optional AM/PM. Tip... You can copy/paste Regex Patterns from the Field Data Discovery String Patterns tab. |
Invert Test | Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test. |
Dimension | (Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension. A dimension represents a characteristic of data quality: • Accuracy - The data is correct. • Completeness - The data is present. • Consistency - The data uses the same format or pattern across different sources. • Timeliness - The data is recent and available. • Uniqueness - The data is not duplicated. • Validity - The data conforms to business rules and is within an acceptable range. Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics. For more information about dimensions, see Add New Rule. For information about managing dimensions, see Managing Data Quality Dimensions. |
Weight | (Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension). For more information, see Add New Rule. |