User Guide > Designing and Executing Data Profile > Creating a Profile Using Data Profile Wizard
Was this helpful?
Creating a Profile Using Data Profile Wizard
The New Data Profile Wizard, allows you to create a new data profile by defining a source, specifying testing rules, selecting a field to assess, and defining a target where the pass and fail information is stored (only if you want to change the pre-configured target locations). The wizard pages include:
1. New Data profile page – Select a project and enter a name for the new data profile file.
2. Define Source page – Provide the source connection details.
3. Create Data Quality Rules page – Specify testing rules and select a field to test.
4. Configure Targets – All targets are pre-configured but you can define the pass and fail target connection details (only if you want to change the pre-configured target locations).
5. Data Profiler Summary page – Provides a summary of information about the new data profile that you are about to create.
Navigation between pages of the wizard is possible by using the Back and Continue buttons at the bottom of the page. Navigating between pages will not clear data that has been entered by the user. When navigation to other pages is not possible, the button will fade in color and will not be active. You can exit at any time by clicking Finish and what you have configured is saved. You can later edit the data profile file in the Data Profile Editor. You can also exit the process anytime without saving any information by clicking Cancel.
Note:  You can also create a profile without using the New Data Profile wizard. For more information, see Creating a Profile Without Using Wizard.
To create a data profile file using the New Data Profile Wizard:
1. Select a DataConnect project and do any of the following:
Go to File > New > Data Profile.
Click the arrow in /download/attachments/24975419/ProjectExplorer_New_Icon.png?version=1&modificationDate=1487964007993&api=v2 and then click Data Profile.
Right-click on the project and then click New > Data Profile.
The Data Profile Wizard is displayed.
2. Select the project in which you want to create the data profile file.
3. In the Profile File Name field, type a name for the new data profile, and click Next.
The Define Source page is displayed.
4. In the Source Connection section do one of the following:
From the Choose Connector dropdown list, select a source connector.
In Or Connection, click Browse and select a saved (or existing) User Defined Connection (See Saving and Reusing a Connection).
The connector parts are displayed. Also, the selected connector’s properties are displayed on the right.
5. Specify the Source Connection information. For information about the selected connector and its properties, see Map Connectors. For information about source connection, see Specifying Source Connection for Data Profile.
6. Select or unselect the Retain Source Data Order in Targets checkbox.
This option allows you to retain source data order in the four pre-configured targets or output files (PASS_TARGET, FAIL_TARGET, DRILLDOWN_TARGET, STATS_TARGET). Unless this option is selected, data written to the targets will be in random order. Default is selected.
7. Click Next.
The Create Data Quality Rules page is displayed.
8. Do the following to define a rule:
a. Select the default blank rule, Rule_1, and then select a Commonly Used Rules from the displayed list. You can select one of the following rules:
Check for Duplicates: Allows you to check for duplicate values in one or more selected fields. You can select multiple fields for testing. Duplicate rows are written to FAIL_TARGET.
Check for Missing values: Allows you to check for Blank and Null values in one or more selected fields. You can select multiple fields for testing. Missing values rows are written to FAIL_TARGET.
Check Compare to Constant(s): Allows you to compare a field value to the specified constant value. Select from a list of comparison Operators. You can select only one field for testing. For equal or not equal you can list one or more constants separated by a “|” character. Rows failing test are written to FAIL_TARGET.
Check value Matches Pattern: Allows you to find a pattern match in a field value. Select from a list of regular Patterns or specify your own regular expression. You can select only one field for testing. For a complete list of available Patterns, see Matches Regex. Rows not matching the specified pattern are written to FAIL_TARGET.
Note:  The wizard provides only a subset of rules that can be used to discover the quality of your source data. The Data Profile Editor, has many more rules that you can use. For a complete list of available rules, see Rule and Parameter Reference.
b. Select one or more fields where you want to test the rule.
The add rules icon changes from to after all required information for the new rule has been specified. This also indicates that Rule_1 is ready to use.
c. If you want to add another rule then click .
d. A new blank rule, Rule_2, is automatically created and you can define another rule to test by repeating step (a) and (b).
Note:  You can select any rule to view or edit it. You can also select a rule and click to delete it. When you delete a rule, the numeric part of all subsequent rule names is automatically adjusted to the correct number.
9. Click Next.
The Configure Targets page is displayed.
10. Select a target to view or configure the target connection information.
There are four pre-configured targets or output files (however, you can edit and change the files if required):
PASS_TARGET: This is the generated clean file. It is of the same format as the input file and contains the records from the source dataset that passes the criteria specified in the Data Profile rule. The output can be written to a file, or a JDBC table.
FAIL_TARGET: This is the generated dirty file. It contains the records from the source dataset that do not pass one or more Data Profile rules. The output can be written to a file, or a JDBC table.
DRILLDOWN_TARGET: This file is used to create the stats and charts on the Statistics tab. You can browse this file in the editor to see the relationship between the FAIL_TARGET records and the specified rule.
STATS_TARGET: This file is used to visualize the PASS_TARGET and FAIL_TARGET data in the Statistics tab. These charts display the number and percentage of records that passed or failed the specified rule.
Note:  For more information about output files, see Viewing Pass, Fail, and Drill Down Output.
11. In the Target Connection section do one of the following (only if you want to change the pre-configured target locations):
From the Choose Connector dropdown list, select a target connector.
In Or Connection, click Browse and select an existing target connection file.
The connector parts are displayed. Also, the selected connector’s properties are displayed on the right.
12. Specify the Target Connection information. For information about the selected connector and its properties, see Map Connectors. For information about target connection, see Specifying Target Connection for Data Profile.
13. Click Connect and then click Next.
14. Review the Data Profile Summary page and then click Finish.
The Data Profile Wizard is closed. The data profile file opens in the Data Profile Editor and displays the configured information.
After the Profile is created, it is saved within the specified project, and can be opened and edited in the Data Profile Editor. Data Profile artifacts have a .dp file extension. For information about validating and running data profile, see Validating and Running Profile.
Last modified date: 07/26/2024