Rule Name | Description |
|---|---|
Provides users the ability to evaluate multiple fields and conditions within a single rule and it ensures that a specific relationship or calculation between fields is met. | |
Returns true or false after comparing a field to a constant value. | |
Returns true or false after comparing a field value to another field value. | |
Writes all the distinct (unique) values found in a source dataset into a file. | |
Writes all the duplicate values found in a source dataset into a file. | |
Calculates equal width ranges across the field values and counts the number of values in each range. | |
Returns true if two strings match based on the configured fuzzy matching rules and match score filter. | |
Returns true if a field is not blank, returns false otherwise. | |
Returns true if a field is not duplicated, returns false otherwise. | |
Returns true if a field is not null, returns false otherwise. | |
Checks if the value is within specified range. | |
Returns true if a field matches a regular expression, returns false otherwise. | |
Calculates the most frequent values of a field. | |
Calculates the following statistics for a numeric field. Min, Max, Mean, Median, Mode, Standard Deviation, Sum, Variance. |
Rule Name | A default rule name (Assert or Assert_n, where “n” is 1,2,3, and so on) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | This is shown as empty as this rule can be used to evaluate multiple fields. |
Rule Type | The type of rule that is applied to the field. That is Assert (Test rule). Tip... A Red Cross in a rule icon (for example |
Invert Test | Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test. |
Script (Left Expression) | The left expression is always a script and can be one or more valid expressions supported by the ExecuteExpression rule. To add multiple regular expressions use the matchesPattern string function: matchesPatterns('FieldName', "patternA", "patternB", "patternC") where FieldName is the field name and patternABC are regular expressions. For example: matchesPatterns('Account Number', "01-.*", "02-.*", "03-.*") Click Build to open the Expression Builder dialog which helps you to construct expressions. This dialog provides a list of available fields, operators, and functions you can use to build expressions. See Using the Expression Builder. |
Operator | The operator that compares the left side (actual value) with the right side (expected value). Select from one of the following comparison operators: • Equal - Checks if the left side value is equal to the right side value. • Greater - Checks if the left side value is greater than the right side value. • Greater or Equal - Checks if the left side value is greater than or equal to the right side value. • Lesser - Checks if the left side value is lesser than the right side value. • Lesser or Equal - Checks if the left side value is lesser than or equal to the right side value. • Not Equal - Checks if the left side value is not equal to the right side value. • StartsWith - Checks if the left side value starts with the right side value. • EndsWith - Checks whether the left side value ends with the right side value. • Matches - Checks whether the left side value matches the right side value (uses regex match). |
Expression Type (Right Expression) | This drop-down displays options for the type of right hand side expression. The available choices depend on the selection made in the Operator field. The following options are available: • Expression - Compare the left expression against a right expression. The right expression can be specified in the Script box. This option is available for the Equal, Greater, Greater or Equal, Lesser, Lesser or Equal, Not Equal, StartsWith, and EndsWith operators. • Constant - Compare the left expression against a list of constants. You can type one or more constant values to compare and then press the Enter key or click Add. This option is available for the Equal, Greater, Greater or Equal, Lesser, Lesser or Equal, Not Equal, StartsWith, and EndsWith operators. • Field - Compare the left expression against a specific field. You can specify only one field using the Compare Field drop-down. This option is available for the Equal, Greater, Greater or Equal, Lesser, Lesser or Equal, Not Equal, StartsWith, and EndsWith operators. • Lookup - Compare the left side value with a looked up value. You can specify the Lookup value using the Lookup widget (see LookupValue rule). This option is available for the Equal, Greater, Greater or Equal, Lesser, Lesser or Equal, Not Equal, StartsWith, and EndsWith operators. • NotBlank - Check if the left side value is not blank. This option is available only when the Operator is specified as Equal. • NotNull - Check if the left expression value is not null. This option is available only when the Operator is specified as Equal. • Regex - Compare the left side value against the specified regular expression (Regex) pattern. This option is available only when the Operator is specified as Matches. |
Script (Right Expression) | The right expression in some cases is a script and can be any valid expression supported by the ExecuteExpression rule. You can click Build to open The Expression Builder dialog which helps you to construct expressions. This dialog provides a list of available fields, operators, and functions you can use to build expressions. See Using the Expression Builder. |
Regular Expression (Right Expression) | The right expression when the Operator is specified as Matches. In this case the Expression Type is set to Regular Expression and cannot be changed. You can specify a regular expression (Regex) pattern here. See Java Regular Expressions. |
Compare Field (Right Expression) | Used to specify a field for comparison, when the Expression Type is Field. This drop-down is displayed only when the Expression Type is Field. |
Constant box (Right Expression) | Used to specify a list of constants for comparison, when the Expression Type is Constant. This text box is displayed only when the Expression Type is Constant. You can type one or more constant values to compare and then press the Enter key or click Add. |
Lookup widget (Right Expression) | Used to specify the Lookup value for comparison, when the Expression Type is Lookup. The Lookup Widget is displayed only when the Expression Type is Lookup. To learn how to use the lookup widget, see LookupValue rule. |
Dimension | (Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension. A dimension represents a characteristic of data quality: • Accuracy - The data is correct. • Completeness - The data is present. • Consistency - The data uses the same format or pattern across different sources. • Timeliness - The data is recent and available. • Uniqueness - The data is not duplicated. • Validity - The data conforms to business rules and is within an acceptable range. Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics. For more information about dimensions, see Rules Tab. For information about managing dimensions, see Managing Data Quality Dimensions. |
Weight | (Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension). For more information, see Rules Tab. |
Rule Name | A default rule name (<FieldName>_CompareToConstant) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, myField (String). |
Rule Type | The type of rule that is applied to the field. That is CompareToConstant (Test rule). Tip... A Red Cross in a rule icon (for example |
Invert Test | Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test. |
Operator | Select from one of the following comparison operators: • Equal • Greater • Greater or Equal • Lesser • Lesser or Equal • Not Equal |
Constant | In this text box, type the constant value to compare, and then press the Enter key or click Add. For equal or not equal you can specify one or more constants. |
Dimension | (Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension. A dimension represents a characteristic of data quality: • Accuracy - The data is correct. • Completeness - The data is present. • Consistency - The data uses the same format or pattern across different sources. • Timeliness - The data is recent and available. • Uniqueness - The data is not duplicated. • Validity - The data conforms to business rules and is within an acceptable range. Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics. For more information about dimensions, see Rules Tab. For information about managing dimensions, see Managing Data Quality Dimensions. |
Weight | (Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension). For more information, see Rules Tab. |
Rule Name | A default rule name (<FieldName>_CompareToField) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, myField (String). |
Rule Type | The type of rule that is applied to the field. That is CompareToField (Test rule). |
Invert Test | Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test. |
Operator | Select from one of the following comparison operators: • Equal • Greater • Greater or Equal • Lesser • Lesser or Equal • Not Equal |
Compare Field | Select the required field to compare from. |
Dimension | (Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension. A dimension represents a characteristic of data quality: • Accuracy - The data is correct. • Completeness - The data is present. • Consistency - The data uses the same format or pattern across different sources. • Timeliness - The data is recent and available. • Uniqueness - The data is not duplicated. • Validity - The data conforms to business rules and is within an acceptable range. Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics. For more information about dimensions, see Rules Tab. For information about managing dimensions, see Managing Data Quality Dimensions. |
Weight | (Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension). For more information, see Rules Tab. |
Rule Name | A default rule name (<FieldName>_DistinctValues) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, myField (String). |
Rule Type | The type of rule that is applied to the field. That is DistinctValues (Summary rule). |
Results File | Displays the distinct values output file. Default is within your project. However, you can change it. |
Sort by | Select the required Sort Order from the following: • Field Data (Faster) • Frequency Count (Slower) Note: Frequency Count (Slower) sorts the results in descending order and hence the sorting is slower compared to Field Data sort option. |
Rule Name | A default rule name (<FieldName>_DuplicateValues) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, myField (String). |
Rule Type | The type of rule that is applied to the field. That is DuplicateValues (Summary rule). Tip... A Red Cross in a rule icon (for example |
Results File | Displays the duplicate values output file. Default is within your project. However, you can change it. |
Sort by | Select the Sort Order from the following: • Field Data (Faster) • Frequency Count (Slower). Note: Frequency Count (Slower) sorts the results in descending order and hence the sorting is slower compared to Field Data sort option. |
Min Count | Type the required value (default is 2). The value must not be lower than 2. |

Rule Name | A default rule name (<FieldName>_EqualRangeBinning) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, Count (Integer). |
Rule Type | The type of rule that is applied to the field. That is EqualRangeBinning (Summary rule). |
RangeCount | Specify the number of ranges to create. The default is 10. |
Rule Name | A default rule name (<FieldName>_FuzzyMatch) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, myField (String). |
Rule Type | The type of rule that is applied to the field. That is FuzzyMatch (Test rule). Tip... A Red Cross in a rule icon (for example |
Invert Test | Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test. |
Fuzzy Match Algorithm | • CONTAINS • DAMERAU_LEVENSHTEIN • EXACT_MATCH • JARO • JARO_WINKLER • LEVENSHTEIN • QGRAM • POSITIONAL_QGRAM • SHORTHAND |
Constant | In this text box, type the constant value with which to compare, and then press the Enter key or click Add. You can compare with multiple values but at least one constant should be specified. |
Fuzzy Score Filter | Add a decimal values between 0.01 to 1. Default value is 0.7. Comparison score is between 0 to 1. Records with comparison score less than this value are not considered a match for probable duplicates and are discarded. The higher the value you select, more strict is the matching. |
Dimension | (Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension. A dimension represents a characteristic of data quality: • Accuracy - The data is correct. • Completeness - The data is present. • Consistency - The data uses the same format or pattern across different sources. • Timeliness - The data is recent and available. • Uniqueness - The data is not duplicated. • Validity - The data conforms to business rules and is within an acceptable range. Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics. For more information about dimensions, see Rules Tab. For information about managing dimensions, see Managing Data Quality Dimensions. |
Weight | (Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension). For more information, see Rules Tab. |
Rule Name | A default rule name (<FieldName>_IsNotBlank) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, myField (String). |
Rule Type | The type of rule that is applied to the field. That is IsNotBlank (Test rule). |
Invert Test | Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test. |
Dimension | (Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension. A dimension represents a characteristic of data quality: • Accuracy - The data is correct. • Completeness - The data is present. • Consistency - The data uses the same format or pattern across different sources. • Timeliness - The data is recent and available. • Uniqueness - The data is not duplicated. • Validity - The data conforms to business rules and is within an acceptable range. Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics. For more information about dimensions, see Rules Tab. For information about managing dimensions, see Managing Data Quality Dimensions. |
Weight | (Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension). For more information, see Rules Tab. |
Rule Name | A default rule name (<FieldName>_IsNotDuplicate) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, myField (String). |
Rule Type | The type of rule that is applied to the field. That is IsNotDuplicate (Test rule). |
Dimension | (Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension. A dimension represents a characteristic of data quality: • Accuracy - The data is correct. • Completeness - The data is present. • Consistency - The data uses the same format or pattern across different sources. • Timeliness - The data is recent and available. • Uniqueness - The data is not duplicated. • Validity - The data conforms to business rules and is within an acceptable range. Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics. For more information about dimensions, see Rules Tab. For information about managing dimensions, see Managing Data Quality Dimensions. |
Weight | (Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension). For more information, see Rules Tab. |
Rule Name | A default rule name (<FieldName>_IsNotNull) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, myField (String). |
Rule Type | The type of rule that is applied to the field. That is IsNotNull (Test rule). |
Invert Test | Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test. |
Dimension | (Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension. A dimension represents a characteristic of data quality: • Accuracy - The data is correct. • Completeness - The data is present. • Consistency - The data uses the same format or pattern across different sources. • Timeliness - The data is recent and available. • Uniqueness - The data is not duplicated. • Validity - The data conforms to business rules and is within an acceptable range. Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics. For more information about dimensions, see Rules Tab. For information about managing dimensions, see Managing Data Quality Dimensions. |
Weight | (Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension). For more information, see Rules Tab. |
Rule Name | A default rule name (<FieldName>_InRange) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, Count (Integer). |
Rule Type | The type of rule that is applied to the field. That is InRange (Test rule). Tip... A Red Cross in a rule icon (for example |
Lower Bound | The lower range value. Select Inclusive to include the lower bound number in the range. |
Upper Bound | The upper range value. Select Inclusive to include the upper bound number in the range. |
Dimension | (Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension. A dimension represents a characteristic of data quality: • Accuracy - The data is correct. • Completeness - The data is present. • Consistency - The data uses the same format or pattern across different sources. • Timeliness - The data is recent and available. • Uniqueness - The data is not duplicated. • Validity - The data conforms to business rules and is within an acceptable range. Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics. For more information about dimensions, see Rules Tab. For information about managing dimensions, see Managing Data Quality Dimensions. |
Weight | (Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension). For more information, see Rules Tab. |
Rule Name | A default rule name (<FieldName>_MatchesRegex, <FieldName>_MatchesRegex_n, where “n” is 1,2,3, and so on) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, myField (String). |
Rule Type | The type of rule that is applied to the field. That is MatchesRegex (Test rule). |
Invert Test | Select this to report the opposite results in your pass or fail file. For more information, see Invert Rule Test. |
Regular Expression | Specify one or more regular expressions by typing or selecting them from the drop-down list and then clicking Add. Click You can also: • Copy/paste a regular expression or pattern detected by data discovery. See Field Data Discovery (also see steps, below). • Paste a macro by right-clicking in the text box and selecting the macro from the Paste Macro dialog. See Pasting Macro in Map, Profile or Process. • Create a macro by right-clicking in the text box. See Creating a Macro by Selecting a Value. To copy/paste values detected by data discovery: 1. In the Field/Rule pane, select a field to open the Field Data Discovery pane. 2. In the Field Data Discovery pane: • Click • Select the String Patterns tab. • Right-click on the cell that contains the desired regular expression or pattern. • Do one of the following: – Select Copy Regex Pattern to copy the regular expression. – Select Copy Input Value to copy the input value. 3. In the Field/Rule pane, select the rule to open the Rule Definition pane. 4. Paste into the Regular Expression box and click Add. 5. To add another regular expression or pattern provided by data discovery, select the field again. |
Dimension | (Optional) Select a dimension to associate the rule with. There is no limit to the number of rules a dimension can be associated with. A rule can be associated with a single dimension. A dimension represents a characteristic of data quality: • Accuracy - The data is correct. • Completeness - The data is present. • Consistency - The data uses the same format or pattern across different sources. • Timeliness - The data is recent and available. • Uniqueness - The data is not duplicated. • Validity - The data conforms to business rules and is within an acceptable range. Each dimension generates a Dimension Score for its associated rules. The score indicates the degree to which the data meets the characteristic. Scores can be viewed in the Statistics tab post profile execution. See Viewing Statistics. For more information about dimensions, see Rules Tab. For information about managing dimensions, see Managing Data Quality Dimensions. |
Weight | (Optional) Select the importance level of the rule. Values are 1-5, where 5 is the most important. The default value is 1. This value is reflected in the Data Quality Index (DQI) score and the Dimension Score (if the rule is associated with a dimension). For more information, see Rules Tab. |
Rule Name | A default rule name (<FieldName>_MostFrequentValues) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, myField (String). |
Rule Type | The type of rule that is applied to the field. That is MostFrequentValues (Summary rule). |
Top How Many | Specify the number of desired frequent values. The default is 25, which returns the top 25 most frequent values. |
Rule Name | A default rule name (<FieldName>_Statistics) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field. |
Field Name | The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, Count (Integer). |
Rule Type | The type of rule that is applied to the field. That is Statistics (Summary rule). |