Creating a Profile Using Data Profile Wizard
The New Data Profile Wizard, allows you to create a new data profile by defining a source, specifying testing rules, selecting a field to assess, and defining a target where the pass and fail information is stored (only if you want to change the pre-configured target locations). The wizard pages include:
1. New Data profile page – Select a project and enter a name for the new data profile file.
2. Define Source page – Provide the source connection details.
3. Create Data Quality Rules page – Specify testing rules and select a field to test.
4. Configure Targets – All targets are pre-configured but you can define the pass and fail target connection details (only if you want to change the pre-configured target locations).
5. Data Profiler Summary page – Provides a summary of information about the new data profile that you are about to create.
Navigation between pages of the wizard is possible by using the Back and Next buttons at the bottom of the page. Navigating between pages will not clear data that has been entered by the user. When navigation to other pages is not possible, the button will fade in color and will not be active. You can exit at any time by clicking Finish and what you have configured is saved. You can later edit the data profile file in the Data Profile Editor. You can also exit the process anytime without saving any information by clicking Cancel.
To create a data profile using the New Data Profile Wizard:
1. Select a DataConnect project and do any of the following:
• Go to File > New > Data Profile.
• Click the arrow in
and then click
Data Profile.
• Right-click on the project and then click New > Data Profile.
The New Data Profile Wizard is displayed.
2. Select the project in which you want to create the data profile file.
3. In the Profile File Name field, type a name for the new data profile, and click Next.
The Define Source page is displayed.
4. In the Source Connection section do one of the following:
• From the Choose Connector dropdown list, select a source connector.
The connector parts are displayed. Also, the selected connector’s properties are displayed on the right.
6. Select or unselect the Retain Source Data Order in Targets checkbox.
This option allows you to retain source data order in the four pre-configured targets or output files (PASS_TARGET, FAIL_TARGET, DRILLDOWN_TARGET, STATS_TARGET). Unless this option is selected, data written to the targets will be in random order. Default is selected.
7. Click Next.
The Create Data Quality Rules page is displayed.
8. Define a rule. You have the following options:
– Click
(Inspect Data and Auto Add Rules) to use internal algorithms which inspect the source data and recommend rules based on knowledge of the source schema and various data pattern matching tests. See
Inspecting Data and Auto Adding Rules using Wizard.
9. Click Next.
The Configure Targets page is displayed.
10. Select a target to view or configure the target connection information.
There are four pre-configured targets or output files (however, you can edit and change the files if required):
• PASS_TARGET: This is the generated clean file. It is of the same format as the input file and contains the records from the source dataset that passes the criteria specified in the Data Profile rule. The output can be written to a file, or a JDBC table.
• FAIL_TARGET: This is the generated dirty file. It contains the records from the source dataset that do not pass one or more Data Profile rules. The output can be written to a file, or a JDBC table.
• DRILLDOWN_TARGET: This file is used to create the stats and charts on the Statistics tab. You can browse this file in the editor to see the relationship between the FAIL_TARGET records and the specified rule.
• STATS_TARGET: This file is used to visualize the PASS_TARGET and FAIL_TARGET data in the Statistics tab. These charts display the number and percentage of records that passed or failed the specified rule.
11. In the Target Connection section do one of the following (only if you want to change the pre-configured target locations):
• From the Choose Connector dropdown list, select a target connector.
• In Or Connection, click Browse and select an existing target connection file.
The connector parts are displayed. Also, the selected connector’s properties are displayed on the right.
13. Click Connect and then click Next.
14. Review the Data Profile Summary page and then click Finish.
The Data Profile Wizard is closed. The data profile file opens in the Data Profile Editor and displays the configured information.
After the Profile is created, it is saved within the specified project, and can be opened and edited in the Data Profile Editor. Data Profile artifacts have a
.dp file extension. For information about validating and running data profile, see
Validating and Running Profile.