Field Data Discovery
Data Discovery provides insights into patterns, values, and formats within the columns of a source dataset. It serves as a powerful tool for users to identify potential issues and understand the nature of their data more deeply. By exploring the characteristics of each column, users can pinpoint anomalies, inconsistencies, or outliers that may require attention. This also helps to determine the rules and rule types that are needed for data profiling and remediation.
Data Discovery can be configured to run automatically or can be triggered manually (see
Setting Data Profile Preferences). When configured to run automatically, data discovery will automatically run when user connects to the source file during profile creation or upon opening an existing profile.
On the Rules tab, Field/Rule pane, the rules are grouped under Field Names. When you select a field name, it allows you to view the Field Data Discovery on the right pane. If no fields are selected, no Field Data Discovery results are shown.
IMPORTANT! The content of the Field Data Discovery pane will not be displayed if you are not connected to the source.
Data Discovery options:
Note: If no rules are defined, the information and run icons will not be displayed on the Field Data Discovery pane. In such instances, users must add a rule to gain access to these icons.
The field data discovery information is based on the sample size source data and is displayed in the following tabs:
• Most Frequent Values: Shows the
Value,
Count (the number of times it occurs), and
Frequency of occurrence for all the unique values (includes blank and empty values) in the selected field. The
Total and
Missing are
also displayed at the top. The
Most Frequent Values are discoverable for most data types. See
MostFrequentValues.
• String Patterns: Shows the Count-%, Input, RegX Pattern, Display Pattern and Literal Pattern (which are described below). The total number of Unique Patterns and the total number of records sampled are also displayed at the top.
String patterns can be copy/pasted into rules (or any enabled text box) by right-clicking and selecting
Copy Regex Pattern (to copy the regular expression) or
Copy Input Value (to copy the input value). For example, you can paste a copied regex pattern into the
MatchesRegex rule.
The following describes the discovered string patterns:
– Count-%: The total number of rows and the percentage of rows in the dataset that have the current pattern.
– Input: The unique field value which the other String Patterns columns describe.
– RegX Pattern: The regex pattern (regular expression pattern) used by the Input.
– Display Pattern: A user friendly pattern (created from different character classes like digits, alphabets, special characters, and space) which describes the Input.
– Literal Pattern: This pattern is same as the Input value but the regex related metacharacters are escaped. For example, a literal period (.) is displayed as (\.). This is required if you want to use it as a regular expression.
The
String Patterns are discoverable for String data type only. See
MatchesRegex.
• Statistics: Shows the following information for numeric data type:
– Mode - Most frequently occurring value in the selected field.
– Min - The lowest value in the selected field.
– Max - The highest value in the selected field.
– Mean - The average of the given numbers in the selected field.
– Median - The median (middle value) of the numbers in the selected field.
– Standard Deviation - The Standard Deviation value for the selected field.
– Variance - The Variance value for the selected field.
– Sum - The sum of all values in the selected field.
– Quantile (25.0)
– Quantile (50.0)
– Quantile (75.0)
– Quantile Outlier Lower Bound
– Quantile Outlier Lower Bound
This information is discoverable for numeric data types only. String length is used for string fields. See
Statistics.
• Equal Range Binning: Equal width binning involves dividing the range of source field values into a specified number of equally spaced intervals (default is 10) between the minimum and maximum values. This information is discoverable for numeric data types only. See
EqualRangeBinning.
• Possible Data Type: A sample size of source field is scanned to identify the potential data type of a field. This discovery result is shown for strings that could possibly be converted to other data types. For example string to date, time or boolean types.
Discovered data types and pictures can be copy/pasted into rules (or any enabled text box) by right-clicking and selecting
Copy Type (to copy the data type) or
Copy Picture (to copy the data picture). For example, you can paste a copied date format picture into the
ChangeFormat rule.
Last modified date: 09/22/2025