Using Data Profiler
Data Profiler allows you to generate data quality rules that are specific to an individual use case or project. You can configure Profiler rules to implement specific data patterns, formats, and values to display null, blank, or duplicate fields. The Statistics tab in the Data Profiler displays both overall and rule specific pass or fail statistics.
The following rule types are available in the Data Profiler:
• Summary Rule - Generates aggregate statistics.
• Test Rule - Generates pass and fail statistics based on defined data quality rules.
• Conversion Rule - Converts one data type into a different output data type. Generates pass and fail statistics based on conversion, success, or failure and derived fields of the converted type so that the additional rules can be built using the derived or converted field.
• Function Rule - Generates derived fields to which other rule can be applied. Create new fields to profile.
The following outputs are generated by the Data Profiler:
• PASS_TARGET - This is the generated clean file. It is the same format as the input file that contains all the rows and fields that passes all the rules. The output can be written to a file, or a JDBC table. The PASS_TARGET file contains the records from the source dataset that passes the criteria specified in the Data Profile rule. This file will be available at the specified location.
• FAIL_TARGET - This is the generated dirty file. The output can be written to a file, or a JDBC table. The FAIL_TARGET file contains the records from the source dataset that do not pass one or more Data Profile rules.
• DRILLDOWN_TARGET - This file is used by Data Profiler to create the stats and charts on the Statistics tab. This file is used to browse all the records from the FAIL_TARGET file and the rule associated with each failed record.
• STATS_TARGET - The generated STATS_TARGET file is used for visualization of the PASS_TARGET and FAIL_TARGET data in the Statistics tab. These charts display the number and percentage of records that passed or failed each rule.
Last modified date: 02/01/2024