Generating Rules using AutoGen Wizard
AutoGen allows you to automatically generate rule for fields based on data type and or field name matching pattern. Actian DataConnect supplies one AutoGen ruleset (DefaultRules.rules). However, you can create your custom set of rules that are very specific for any dataset. There is no limit to the number of .rules files you can create.
The AutoGen .rules file contains a list of default rules that can be applied to any source dataset and executed. You can create a new .rules file or edit the existing .rules file. To use, create, edit, and view the AutoGen file, see the following topics:
To auto generate profile rules:
1. From the
Data Profile Editor, click
Profile tab, and then click
on the
Rules grid
. The AutoGen Wizard is displayed.
2. Select the fields on which you want to apply the rules:
a. From Available Fields box, click and select one field or [Ctrl]+Click to select multiple fields.
b. Click
to move the selected fields to the
Selected Fields box.
Note: - You can select fields from the
Selected Fields box and move them back to the
Available Fields box by clicking
.
- You can move all the fields to the
Selected Fields box by clicking
.
- You can move back all the fields to the
Available Fields box by clicking
.
3. Select one of the following options to indicate how you wish to apply the rules:
• Apply All rules - Applies all the rules on the selected fields. This is also the default selection.
• Apply Best rule -Applies the first rule where the value of the field name pattern matches the most field names in the source dataset (excluding wild card matches).
• Apply First rule - Applies the first rule that matches a value or field.
4. (Optional) Choose a file from where to apply the rules. For this, click Browse and select the DefaultRules.rules file or a custom .rules file from the file directory and then click OK.
5. (Optional) Select the Remove selected field rules check box.
Select this option to remove any rules that already exist on the Profile tab for the fields selected in this AutoGen run. If you do not select this option, then additional rules are added for the selected fields, including potential duplicates automatically.
6. Click Next.
The Manage AutoGen Rules window is displayed. You can do the following tasks from this window:
• View the rules that are being added to the AutoGen .rules file - Click the order number (first column) to see the rules set for that data type or field name pattern on the right.
Note: Rules are listed on the right based on the selected data type. The rules that appear selected are auto-generated for the specific data type or field name pattern when you click Finish.
• Edit a data type - Select the order number, click within the corresponding Type text box, click the dropdown arrow, and then select a new data type.
Note: A few rules have parameters that are mandatory to configure. For example, Date Patterns, Timestamp Patterns, Sort Orders, and so on.
• Edit a Field Name Pattern - Select the order number, click within the corresponding Field Name Pattern text box, and then type the string that you want to match against the field names. You can use the wild card character “*” before or after the string. For example, *Name, which could match on First Name or Last Name.
Note: Fields that match both data type (Type) and pattern (Field Name Pattern) will get the selected rules auto-generated.
• Add an AutoGen rule - Click Add a new Rule. A new entry is added to Order, Type, and Field Name Pattern with corresponding default rules, which you can edit. The default data type is String.
• Remove an AutoGen rule - Select the order number and then click Remove.
• Change the order - Select the order number and then click Up to move the selected rule up in the list, or Down to move the selected rule down in the list.
• Apply AutoGen rules - Under
Rules, select the rules that you want to apply to fields that match the corresponding data type (
Type) and pattern (
Field Name Pattern). For more information, see
Rule and Parameter Reference.
Note: You can select more than one rule by selecting the check box beside the required rule.
7. Click Finish.
A message box is displayed showing the location where the new or edited rules file has been saved.
8. Click OK.
The AutoGen Wizard is closed and the auto-generated rules are displayed in the Rules grid. The new .rules file is saved as the rule file preference for future AutoGen runs.
IMPORTANT! You can go to a previous step by clicking Back, and your configured information is not lost. You can also exit at any time by clicking Finish, and what you have configured is saved. You can later view and edit the rule information in the Profile tab of the Data Profile Editor.
Creating AutoGen rules file
The AutoGen .rules file contains a list of default rules that you can run automatically for your profiles to create rules. You can change the default list to include rules that you use frequently, and you can create different .rules files that are very specific to various datasets.
To create a new AutoGen rules file:
1. From the
Data Profile Editor, click
Profile tab, and then click
from the
Rules grid of your existing profile
. The AutoGen Wizard is displayed.
3. Click Next.
The Manage AutoGen Rules window is displayed.
4. Specify a new file name in the Rules File Name text box, and then click New Rule File.
5. On the AutoGen Rules pane, add or edit the required rule.
6. Click Finish.
A message box is displayed showing the location where the new or edited rules file has been saved.
7. Click OK.
The AutoGen Wizard is closed and the auto-generated rules are listed in the Rules grid. The new .rules file is saved as the rule file preference for future AutoGen runs.
Editing AutoGen rules file
To edit an AutoGen rules file:
1. From the
Data Profile Editor, click
Profile tab, and then click
from the
Rules grid of your existing profile
. The AutoGen Wizard is displayed.
2. At the bottom of the window, click Browse and select the required .rules file. The selected file with its path is displayed in the Apply rules from the following file text box.
3. Click Next.
The Manage AutoGen Rules window displays a list of rules associated with the selected .rules file
5. Click Finish.
The Rule Saved message is displayed.
6. Click OK.
The AutoGen Wizard is closed and you are navigated to the Profile tab where the auto-generated rules are listed in the Rules grid. The new .rules file is saved as the rule file preference for future AutoGen runs.
Viewing AutoGen Results
After an AutoGen profile run, you may notice that when you view the .dp.stats.json file, that the table of fields and rulerule includes additional fields that did not appear in your source file. These are derived fields that were created by AutoGen.
Derived Fields
Fields created by a rule or a script that are not included in the original data file or table. Function and conversion rule create output fields that are derived from an original field. For example, if your source data file has a string Birthdate field, then you can use the String to Date conversion rule to generate a date datatype field. After which you create additional rule on the derived field. Contrast with source fields.
You can use derived field, if:
• Your field is a string but you know it is a number field. If you want to test that the number is not greater than 2000, then first you apply a String To Numeric conversion rule on your String field. This will create a d_FieldName_StringToNumeric derived field. You can then add another rule selecting this derived field and select the Compare to Constant rule and fill in the operator (>) and constant (2000) parameters and click Finish.
• String Length is a function rule that produces a derived field. For example, if your database is expecting a string that is exactly 2 characters, then you can apply a String Length rule for your field. After which apply a Compare To Constant rule for the deriveruled field by setting your operator (Equal) and your constant (2). Any field where the string value is not exactly 2 characters will fail. Contrast with source fields.
Using Function rule in Combination with Other rule
Some rule are used with others to achieve certain data quality goals, such as data type conversion, ignoring white space, and discovering the length of a field. Following are a few examples to use a function rule with other rule in a profile specification.
String Trim Function and Other rule
• To convert a string field to another data type and to ignore extra whitespace:
Apply the String Trim rule to a string field, and then apply a conversion rule to the output. Without String Trim, extra white space in the field value causes the conversion rule to fail.
• To determine distinct values in a string field and ignore whitespace:
Apply the String Trim rule to a string field, then apply the Distinct Values rule to the output. Without String Trim, varying amounts of extra whitespace are considered distinct values.
String Length Function and Other rule
The following examples share the context of profiling data before setting up a migration with a database or CRM system target:
• To find the maximum length of a string field and to determine how large a VARCHAR to use when setting up a new DB table:
Apply the String Length rule to a string field and then apply the Max rule to the output.
• To find the average length of a string field and to determine how much disk space is required for a VARCHAR field in a new DB table:
Apply the String Length rule to a string field and then apply the Statistics rule to the output.
• To check the length of field values in a string field and to verify that they all fit in an existing CHAR[N] or VARCHAR[N] field in an existing DB table:
Apply the String Length rule to a string field and then apply Compare To Constant to the output.