Creating a Profile Using Data Profile Wizard
The New Data Profile Wizard, allows you to create a new data profile by defining a source, specifying testing rules, selecting a field to assess, and defining a target where the pass and fail information is stored (only if you want to change the pre-configured target locations). The wizard pages include:
1. New Data profile page – Select a project and enter a name for the new data profile file.
2. Define Source page – Provide the source connection details.
3. Create Data Quality Rules page – Specify testing rules and select a field to test.
4. Configure Targets – All targets are pre-configured but you can define the pass and fail target connection details (only if you want to change the pre-configured target locations).
5. Data Profiler Summary page – Provides a summary of information about the new data profile that you are about to create.
Navigation between pages of the wizard is possible by using the Back and Next buttons at the bottom of the page. Navigating between pages will not clear data that has been entered by the user. When navigation to other pages is not possible, the button will fade in color and will not be active. You can exit at any time by clicking Finish and what you have configured is saved. You can later edit the data profile file in the Data Profile Editor. You can also exit the process anytime without saving any information by clicking Cancel.
To create a data profile using the New Data Profile Wizard:
1. In the Project Explorer, select the project folder where you want to create the new data profile.
2. Open the New Data Profile Wizard using one of the following methods:
• Go to File > New > Data Profile.
• Click the arrow in

and then click
Data Profile.
• Right-click on the project and then click New > Data Profile.
3. In the wizard, verify that the correct project folder is selected — the one where you want to create the new data profile.
4. In the Profile File Name field, type a name for the new data profile, and click Next.
The Define Source page is displayed.
5. In the Source Connection section do one of the following:
• From the Choose Connector dropdown list, select a source connector.
The connector parts are displayed. Also, the selected connector’s properties are displayed on the right.
7. Select or unselect the Retain Source Data Order in Targets checkbox.
This option allows you to retain source data order in the four pre-configured targets or output files (PASS_TARGET, FAIL_TARGET, DRILLDOWN_TARGET, STATS_TARGET). Unless this option is selected, data written to the targets will be in random order. Default is selected.
8. Click Next.
The Create Data Quality Rules page is displayed.
9. Define a rule. You have the following options:
• Click

(
Inspect Data and Auto Add Rules) to use internal algorithms which inspect the source data and recommend rules based on knowledge of the source schema and various data pattern matching tests. See
Inspecting Data and Auto Adding Rules using Wizard.
10. Click Next.
The Configure Targets page is displayed.
11. Select a target to view or configure the target connection information.
There are four pre-configured targets or output files (however, you can edit and change the files if required):
• PASS_TARGET: This is the generated clean file. It is of the same format as the input file and contains the records from the source dataset that passes the criteria specified in the Data Profile rule. The output can be written to a file, or a JDBC table.
• FAIL_TARGET: This is the generated dirty file. It contains the records from the source dataset that do not pass one or more Data Profile rules. The output can be written to a file, or a JDBC table.
• DRILLDOWN_TARGET: This file is used to create the stats and charts on the Statistics tab. You can browse this file in the editor to see the relationship between the FAIL_TARGET records and the specified rule.
• STATS_TARGET: This file is used to visualize the PASS_TARGET and FAIL_TARGET data in the Statistics tab. These charts display the number and percentage of records that passed or failed the specified rule.
12. In the Target Connection section do one of the following (only if you want to change the pre-configured target locations):
• From the Choose Connector dropdown list, select a target connector.
• In Or Connection, click Browse and select an existing target connection file.
The connector parts are displayed. Also, the selected connector’s properties are displayed on the right.
14. Click Connect and then click Next.
15. Review the Data Profile Summary page and then click Finish.
The Data Profile Wizard closes. The data profile file opens in the Data Profile Editor and displays the configured information.
After the Profile is created, it is saved within the specified project, and can be opened and edited in the Data Profile Editor. Data Profile artifacts have a
.dp file extension. For information about validating and running data profile, see
Validating and Running Profile.