Creating a Profile Using Data Profile Wizard
The New Data Profile Wizard, allows you to create a new data profile by defining a source, specifying testing rules, selecting a field to assess, and defining a target where the pass and fail information is stored (only if you want to change the pre-configured target locations). The wizard pages include:
1. New Data profile page – Select a project and enter a name for the new data profile file.
2. Define Source page – Provide the source connection details.
3. Create Data Quality Rules page – Specify testing rules and select a field to test.
4. Configure Targets – All targets are pre-configured but you can define the pass and fail target connection details (only if you want to change the pre-configured target locations).
5. Data Profiler Summary page – Provides a summary of information about the new data profile that you are about to create.
Navigation between pages of the wizard is possible by using the Back and Continue buttons at the bottom of the page. Navigating between pages will not clear data that has been entered by the user. When navigation to other pages is not possible, the button will fade in color and will not be active. You can exit at any time by clicking Finish and what you have configured is saved. You can later edit the data profile file in the Data Profile Editor. You can also exit the process anytime without saving any information by clicking Cancel.
To create a data profile file using the New Data Profile Wizard:
1. Select a DataConnect project and do any of the following:
• Go to File > New > Data Profile.
• Click the arrow in
and then click
Data Profile.
• Right-click on the project and then click New > Data Profile.
The Data Profile Wizard is displayed.
2. Select the project in which you want to create the data profile file.
3. In the Profile File Name field, type a name for the new data profile, and click Next.
The Define Source page is displayed.
4. In the Source Connection section do one of the following:
• From the Choose Connector dropdown list, select a source connector.
The connector parts are displayed. Also, the selected connector’s properties are displayed on the right.
6. Select or unselect the Retain Source Data Order in Targets checkbox.
This option allows you to retain source data order in the four pre-configured targets or output files (PASS_TARGET, FAIL_TARGET, DRILLDOWN_TARGET, STATS_TARGET). Unless this option is selected, data written to the targets will be in random order. Default is selected.
7. Click Next.
The Create Data Quality Rules page is displayed.
8. Do the following to define a rule:
a. Select the default blank rule, Rule_1, and then select a Commonly Used Rules from the displayed list. You can select one of the following rules:
– Check for Duplicates: Allows you to check for duplicate values in one or more selected fields. You can select multiple fields for testing. Duplicate rows are written to FAIL_TARGET.
– Check for Missing values: Allows you to check for Blank and Null values in one or more selected fields. You can select multiple fields for testing. Missing values rows are written to FAIL_TARGET.
– Check Compare to Constant(s): Allows you to compare a field value to the specified constant value. Select from a list of comparison Operators. You can select only one field for testing. For equal or not equal you can list one or more constants separated by a “|” character. Rows failing test are written to FAIL_TARGET.
– Check value Matches Pattern: Allows you to find a pattern match in a field value. Select from a list of regular
Patterns or specify your own regular expression. You can select only one field for testing. For a complete list of available Patterns, see
Matches Regex. Rows not matching the specified pattern are written to FAIL_TARGET.
Note: The wizard provides only a subset of rules that can be used to discover the quality of your source data. The Data Profile Editor, has many more rules that you can use. For a complete list of available rules, see
Rule and Parameter Reference.
b. Select one or more fields where you want to test the rule.
The add rules icon changes from
to
after all required information for the new rule has been specified. This also indicates that
Rule_1 is ready to use.
c. If you want to add another rule then click
.
d. A new blank rule, Rule_2, is automatically created and you can define another rule to test by repeating step (a) and (b).
Note: You can select any rule to view or edit it. You can also select a rule and click
to delete it. When you delete a rule, the numeric part of all subsequent rule names is automatically adjusted to the correct number.
9. Click Next.
The Configure Targets page is displayed.
10. Select a target to view or configure the target connection information.
There are four pre-configured targets or output files (however, you can edit and change the files if required):
• PASS_TARGET: This is the generated clean file. It is of the same format as the input file and contains the records from the source dataset that passes the criteria specified in the Data Profile rule. The output can be written to a file, or a JDBC table.
• FAIL_TARGET: This is the generated dirty file. It contains the records from the source dataset that do not pass one or more Data Profile rules. The output can be written to a file, or a JDBC table.
• DRILLDOWN_TARGET: This file is used to create the stats and charts on the Statistics tab. You can browse this file in the editor to see the relationship between the FAIL_TARGET records and the specified rule.
• STATS_TARGET: This file is used to visualize the PASS_TARGET and FAIL_TARGET data in the Statistics tab. These charts display the number and percentage of records that passed or failed the specified rule.
11. In the Target Connection section do one of the following (only if you want to change the pre-configured target locations):
• From the Choose Connector dropdown list, select a target connector.
• In Or Connection, click Browse and select an existing target connection file.
The connector parts are displayed. Also, the selected connector’s properties are displayed on the right.
13. Click Connect and then click Next.
14. Review the Data Profile Summary page and then click Finish.
The Data Profile Wizard is closed. The data profile file opens in the Data Profile Editor and displays the configured information.
After the Profile is created, it is saved within the specified project, and can be opened and edited in the Data Profile Editor. Data Profile artifacts have a
.dp file extension. For information about validating and running data profile, see
Validating and Running Profile.