Rule Name | Description |
---|---|
Changes the format of a string field from one format to another (applies to string fields which can be parsed as numbers/boolean/date types). | |
Changes the case of a string field to specified case. (for example upper, lower, title). | |
Writes unique records to pass target and duplicate records to fail target using exact match. | |
Writes unique records to pass target and duplicate records to fail target based on fuzzy matching rules and match score filter. | |
Removes specified characters or words from a string field. | |
Replaces a field value with a new value. | |
Trims whitespace from a String field. | |
Applies the specified time zone to a timestamp field. |
Description | Changes the format of a string field from one format to another (applies to string fields which can be parsed as numbers/boolean/date types). |
Parameters | Note: A default Rule Name is provided, however, you can edit or overwrite it. Format Type - The format type of the source data: • Boolean • Date • Double • Float • Integer • Long • Numeric • Time • TimeStamp From Format - You can either select from a list of formats from the drop-down menu or type in the format, and then press the Enter key or click Add. You can add multiple formats if desired. You can also remove any format you have entered by clicking the icon that is displayed next to the format. To Format - You need to pick the format from the drop-down menu. Unlike the From Format, only one format is permitted in the To Format parameter. • Rounding Mode - The rounding mode applies if input data is of the type Double, Float, and Numeric. Available options are: – CEILING - Rounding mode to round towards positive infinity. – DOWN - Rounding mode to round towards zero. – FLOOR - Rounding mode to round towards negative infinity. – HALF_DOWN - Rounding mode to round towards "nearest neighbor" unless both neighbors are equidistant, in which case round down. – HALF_EVEN - Rounding mode to round towards the "nearest neighbor" unless both neighbors are equidistant, in which case, round towards the even neighbor. – HALF_UP - Rounding mode to round towards "nearest neighbor" unless both neighbors are equidistant, in which case round up. – UNNECESSARY - Rounding mode to assert that the requested operation has an exact result, hence no rounding is necessary. – UP - Rounding mode to round away from zero. Locale - The language or country-based preferences for a user interface. Among other things, locales represent currency, date format, and numbers according to the protocols in the given region. You can choose from the available options. |
Supported Data Types | • String (input) • String (output) |
Remarks | • This rule replaces the format and not the values. • When replacing values in the target field, you may encounter a truncation error, often due to a mismatch in field sizes between the target field and the replacement value. To resolve this issue, adjust the field size of target field according to the replaced/overwritten value size. • See DateTime Format • See Class DecimalFormat |
Description | Changes the case of a string field to specified case. (for example upper, lower, title). |
Parameters | Note: A default Rule Name is provided, however, you can edit or overwrite it. Change To Case - The side from which to remove white spaces: • Camel Case - Converts string to camel case which is a programming naming convention. In this all words of a phrase are combined together to form one continuous word and each word except the first word is capitalized. • Kebab Case - Converts string to kebab case which is a way of writing phrases without spaces, where spaces are replaced with hyphens -, and the words are typically all lower case. This is a programming naming convention often used for variable names. This is a programming naming convention. • Lower Case - Converts all characters in the string to lowercase. • Pascal Case - Converts string to pascal case which is a programming naming convention. In this all words of a phrase are combined together to form one continuous word and each word is capitalized. • Snake Case - Converts string to snake case which is a way of writing phrases without spaces, where spaces are replaced with underscores _, and the words are typically all lower case. This is a programming naming convention. • Tile Case - Converts string to title case. The first letter of each word is capitalized except for certain small words like “a,” “the,” and “of.” • Upper Case - (default) Converts all characters in the string to uppercase. |
Supported Data Types | • String (input) • String (output) |
Remarks | None |
Description | Writes unique records to pass target and duplicate records to fail target, using exact match. This rule uses clusters. Let us now understand how the cluster technique works. The main idea in clustering is to group similar records together, into clusters, based on comparison using key fields. When the profile is executed, the values in the specified key field are compared to each other to identify duplicates. If a duplicate value is found, then a cluster is created. The cluster contains the duplicate records, based on the key field and is assigned a cluster ID After a cluster is created, you can configure a Cluster Matching Rule and a Record Matching Rule to specify criteria for finding unique records within a cluster. Note: The duplicate records are written to the fail target and unique records are written to the pass target. The cluster file that is used for configuring the Cluster Matching Rules and Record Matching Rules is stored as a separate csv file. In the Project Navigator view, expand the project folder and double-click the <ProfileName>_RemoveDuplicates.csv file to view the cluster file. See Also: Remove Duplicates Fuzzy Matching. |
Parameters | Grouping tab - Select a field from the list of Input Fields as a key field. The values in the key field will be evaluated to identify duplicates or exact matches. If a duplicate value is found, a cluster is created. The cluster will contain the duplicate records based on the key fields and will be assigned a Cluster ID (visible when browsing the cluster file). • Input Fields - List of available fields that can be used as key fields. • Key Fields - You can move fields from Input Fields to Key Fields to define a key field. You can specify a combination of Fields to define a key field. Note: Key Fields cannot be selected as Sorting Fields and vice versa. • Results File - Browse and select a csv file (only csv files are supported) or specify the path and name of a csv file that will be used as the cluster file. You can use macros for filenames. When the profile is executed, you can browse the cluster file in the Cluster Data tab by selecting the Remove Duplicates rule and clicking from the Rule Definition pane. |
Sorting tab - The Sorting tab allows you to arrange the records within each cluster in a specific order. You can sort on multiple fields. Click to add a row and to delete it. After adding a row, specify the following values: – Field - Click the Field box and select a sorting field from the dropdown. – Sort Order - Click the Sort Order box and select Ascending or Descending for each sorting field. Note: Use the up and down icons () to re-arrange the sorting fields. Unique Record Rules tab - The Unique Record Rules tab provides a way to specify one or more clusters via cluster matching rules, then identify one or more records as unique via the record matching rules. You can perform the following actions: • Click to open the Add Unique Record Rule dialog from where you can add a rule. • Select a rule and click to delete it. • Select a rule and click to edit it. • Select a disabled rule and click to enable it. • Select an enabled rule and click or disable it. • Click the up and down icons () to move the rules up or down. The Add Unique Record Rule dialog allows you to add a rules of the following type: • Cluster Matching Rules - You can define one or more Cluster Matching Rules to select one or more clusters. Multiple rules can be defined and combined using the AND/OR logical operators. Follow these steps: 1. Click to add a row (you can click to delete it). 2. Select from AND/OR. You cannot specify the logical operators for the first row. 3. Click the Field Name box and then click the down arrow that appears and select a field. 4. Click the Function box and then click the down arrow that appears and select a function from the list (CompareToConstant, CompareToField, IsBlank, IsNotBlank, IsNull, MatchesRegex). 5. Click the Operator box and then click the down arrow that appears and select an operator from the list (Equal, Greater, Greater or Equal, Lesser, Lesser or Equal, Not Equal). No operator is required for the IsBlank, IsNotBlank, IsNull, and MatchesRegex functions. 6. Click the Value box and specify a value to compare. No value is required for the IsBlank, IsNotBlank, IsNull, and MatchesRegex functions. Note: If you do not add a rule, all clusters are considered. | |
• Record Matching Rules - You can define a Record Matching criteria or rule to identify one or more records as unique. You can choose from one of the following Criteria options: – None - All records (from the cluster) will qualify. – Completeness - Picks the record (from the cluster) which has least number of null/blank values. – Accuracy - Picks the record (from the cluster) which has the highest number of accurate fields. Accuracy is based on the number of test rule failures for a record. The fewer the failures, the higher the accuracy. – Custom Conditions- Picks all qualifying records (from the cluster) based on custom rules that you add by clicking . To define a custom Record Matching rule to identify one or more records as unique, follow these steps: 1. Click to add a row (you can click to delete it). 2. Select from AND/OR. You cannot specify the logical operators for the first row. 3. Click the Field Name box and then click the down arrow that appears and select a field. 4. Click the Function box and then click the down arrow that appears and select a function from the list (CompareToConstant, CompareToField, IsBlank, IsNotBlank, IsNull, MatchesRegex). 5. Click the Operator box and then click the down arrow that appears and select an operator from the list (Equal, Greater, Greater or Equal, Lesser, Lesser or Equal, Not Equal). No operator is required for the IsBlank, IsNotBlank, IsNull, and MatchesRegex functions. 6. Click the Value box and specify a value to compare. No value is required for the IsBlank, IsNotBlank, IsNull, and MatchesRegex functions. • Record Selection - The record selection option provides a way to select records after applying the Unique Record Rules. These options include: – First - Picks the first from all qualifying records. – Last - Picks the last from all qualifying records. – Any - Picks any from all qualifying records. Usually the first record is picked. – All - Picks all of the qualifying records. – None - Picks none of the qualifying records. – Range - Picks from the specified range of records. You can specify comma separated list of ranges. For example “1, 2-4,” will pick 1st, 2nd through 4th record from the cluster. | |
Note: Completeness and Accuracy pick the best record, except when multiple records have the same score. In such cases, the Record Selection option (First, Last) will determine which one gets picked. If multiple records have the same score, the record selector will define which records are selected. Note that if the score is 0 (that is, all records have a score of 0), then none will be selected. An exceptional case is when you use Record Selection as a Range and provide an out-of-bounds value, which may result in no unique record being selected. When the profile is executed, the focus is shifted to the Statistics tab from where you can view the graphical representation of the pass/fail data. On the Results Summary pane, click on the required field name: • ***Counts - Total*** - Click this to view the total pass and fail data in the pie chart representation. You can click on parts of the graph to view the data in the Drill Down Data tab. • **Rule - BarCharts** - Click this to view all the pass and fail rule in the bar chart representation. You can click on parts of the graph to view the data in the Drill Down Data tab. • -MultipleFields- - Click this to view the following information about the Remove Duplicates rule: – Keys - The Key Fields. – Sort Keys - The keys which sort the records within each cluster. – Number of Unique Record Rules - The number of Unique Record Rules defined. – Cluster File - The path and name of the cluster file. – Number of clusters - The number of duplicate clusters (groups) that were found. – Number of duplicates found - Duplicate row count that was found in the source data. – Number of Unique records - The number of unique records that were found. – Browse Clusters File - Click this to open the clusters file in the Drill Down Data tab. See Also: Viewing Pass, Fail, and Drill Down Output. | |
Cluster Data tab - On the Rules tab, select the Remove Duplicates rule and click from the Rule Definition pane to generate clusters and browse the cluster file in the Cluster Data tab. The following additional fields are displayed: • Select - This checkbox allows you to select records manually. By selecting the desired records and clicking the Generate Rules button, you will create Unique Record Rules. This feature is particularly helpful if you have few duplicates in your source data or prefer to have Data Profiler generate the rules for you. • $$ClusterId - The unique ID that is assigned to each cluster. Using this, you can identify and understand how many clusters exist and how many records are in each cluster. The cluster IDs are not retained, so every time the profile is executed or the cluster file is generated, the order of the clusters and the associated IDs will change. • $$pass - $$Pass gives you information about pass records, if it is blank then the record passed the rule, if marked as “X” then the record failed the rule. • $$rule_id - This is an ID allocated to each Unique Record Rule. The rule_id is only displayed for the unique records identified from the Unique Record Rules. Note: The Generate Clusters icon () is available only for Duplicate Rules. | |
Supported Data Types | Any data type. |
Remarks | • Provides pass and fail statistics and generates a cluster file. Duplicate records are sent to the fail target and unique records are written to the pass target. • Only one Remove Duplicates rule is permitted within a profile. However, multiple fields can be designated as key fields. Once the Remove Duplicates rule has been added to a profile, it will be categorized under MultipleFields on the Rules tab, instead of by the name of the key field. |
Description | Writes unique records to pass target and duplicate records to fail target based on fuzzy matching rules and match score filter. Finding “duplicate” data in this case does not mean finding an exact match but means finding an approximately similar record. Fuzzy Matching (also called Approximate String Matching) is a technique that can be used to filter these similar looking fuzzy duplicates by finding an approximate match. There are many situations where fuzzy matching techniques can be used. For instance, using fuzzy matching, you can find multiple variations of a word, term, or phrase. For example, if your word is “DataConnect,” fuzzy matching returns DataConnect, Data Connect, D Connect, Data C, DattaConecct, and so on. This rule uses clusters. Let us now understand how the cluster technique works. The main idea in clustering is to group similar records together, into clusters, based on comparison using key fields. When the profile is executed, the values in the specified key field are compared to each other to identify duplicates. If a duplicate value is found, then a cluster is created. The cluster contains the duplicate records, based on the key field and is assigned a cluster ID. After a cluster is created, you can configure a Match Score Filter, a Cluster Matching Rule, and a Record Matching Rule to specify criteria for finding approximately similar records within a cluster. Note: The duplicate records are written to the fail target and unique records are written to the pass target. The cluster file is stored as a separate csv file. In the Project Navigator view, expand the project folder and double-click the <ProfileName>_RemoveDuplicates.csv file to view the cluster file. |
Parameters | Grouping tab - Select a field from the list of Input Fields as a key field. The values in the key field will be evaluated to identify duplicates or exact matches. If a duplicate value is found, a cluster is created. The cluster will contain the duplicate records based on the key fields and will be assigned a Cluster ID (visible when browsing the cluster file). • Input Fields - List of available fields that can be used as key fields. • Key Fields - You can move fields from Input Fields to Key Fields to define a key field. You can specify a combination of Fields to define a key field. Note: Key Fields cannot be selected as Sorting Fields and vice versa. • Results File - Browse and select a csv file (only csv files are supported) or specify the path and name of a csv file that will be used as the cluster file. You can use macros for filenames. When the profile is executed, you can browse the cluster file in the Cluster Data tab by selecting the Remove Duplicates rule and clicking from the Rule Definition pane. Matching tab - Allows you to configure string comparison rules and set the match score cutoff. Click to add a comparison rule and to delete it. Specify the following rule details: • Field Name - Select a field whose values are compared with each other using the algorithm specified in Comparison Type. You can specify multiple matching fields and each matching field can be configured to use any of the provided algorithms. This list includes only string fields and those which are not used as Key Fields on the Grouping tab. Note: Only fields within the cluster are compared with each other. • Comparison Type - A list of supported string matching algorithms/functions. For more information, see Remarks. • Weight - Weightage given to the rule which helps to calculate the final score. Default is 1 (same weightage). Allowed range is 1 to 100. If same value is assigned to all rules, they will have same weightage. For example, rule1 - 30, rule2 - 30, rule3 - 30 will have same weightage. • Match Score Filter - Select a decimal values between 0.01 to 1. Comparison score is between 0 to 1. Records with comparison score less than this value are not considered a match for probable duplicates and are discarded. The higher the value you select, more strict is the matching. |
Sorting tab - The Sorting tab allows you to arrange the records within each cluster in a specific order. You can sort on multiple fields. Click to add a row and to delete it. After adding a row, specify the following values: – Field - Click the Field box and select a sorting field from the dropdown. – Sort Order - Click the Sort Order box and select Ascending or Descending for each sorting field. Note: Use the up and down icons () to re-arrange the sorting fields. Unique Record Rules tab - The Unique Record Rules tab provides a way to specify one or more clusters via cluster matching rules, then identify one or more records as unique via the record matching rules. You can perform the following actions: • Click to open the Add Unique Record Rule dialog from where you can add a rule. • Select a rule and click to delete it. • Select a rule and click to edit it. • Select a disabled rule and click to enable it. • Select an enabled rule and click or disable it. • Click the up and down icons () to move the rules up or down. The Add Unique Record Rule dialog allows you to add a rules of the following type: • Cluster Matching Rules - You can define one or more Cluster Matching Rules to select one or more clusters. Multiple rules can be defined and combined using the AND/OR logical operators. Follow these steps: 1. Click to add a row (you can click to delete it). 2. Select from AND/OR. You cannot specify the logical operators for the first row. 3. Click the Field Name box and then click the down arrow that appears and select a field. 4. Click the Function box and then click the down arrow that appears and select a function from the list (CompareToConstant, CompareToField, IsBlank, IsNotBlank, IsNull, MatchesRegex). 5. Click the Operator box and then click the down arrow that appears and select an operator from the list (Equal, Greater, Greater or Equal, Lesser, Lesser or Equal, Not Equal). No operator is required for the IsBlank, IsNotBlank, IsNull, and MatchesRegex functions. 6. Click the Value box and specify a value to compare. No value is required for the IsBlank, IsNotBlank, IsNull, and MatchesRegex functions. | |
Note: If you do not add a rule, all clusters are considered. • Record Matching Rules - You can define a Record Matching criteria or rule to identify one or more records as unique. You can choose from one of the following Criteria options: – None - All records (from the cluster) will qualify. – Completeness - Picks the record (from the cluster) which has least number of null/blank values. – Accuracy - Picks the record (from the cluster) which has the highest number of accurate fields. Accuracy is based on the number of test rule failures for a record. The fewer the failures, the higher the accuracy. – Custom Conditions- Picks all qualifying records (from the cluster) based on custom rules that you add by clicking . To define a custom Record Matching rule to identify one or more records as unique, follow these steps: 1. Click to add a row (you can click to delete it). 2. Select from AND/OR. You cannot specify the logical operators for the first row. 3. Click the Field Name box and then click the down arrow that appears and select a field. 4. Click the Function box and then click the down arrow that appears and select a function from the list (CompareToConstant, CompareToField, IsBlank, IsNotBlank, IsNull, MatchesRegex). 5. Click the Operator box and then click the down arrow that appears and select an operator from the list (Equal, Greater, Greater or Equal, Lesser, Lesser or Equal, Not Equal). No operator is required for the IsBlank, IsNotBlank, IsNull, and MatchesRegex functions. 6. Click the Value box and specify a value to compare. No value is required for the IsBlank, IsNotBlank, IsNull, and MatchesRegex functions. • Record Selection - The record selection option provides a way to select records after applying the Unique Record Rules. These options include: – First - Picks the first from all qualifying records. – Last - Picks the last from all qualifying records. – Any - Picks any from all qualifying records. Usually the first record is picked. – All - Picks all of the qualifying records. – None - Picks none of the qualifying records. – Range - Picks from the specified range of records. You can specify comma separated list of ranges. For example “1, 2-4,” will pick 1st, 2nd through 4th record from the cluster. | |
Note: Completeness and Accuracy pick the best record, except when multiple records have the same score. In such cases, the Record Selection option (First, Last) will determine which one gets picked. If multiple records have the same score, the record selector will define which records are selected. Note that if the score is 0 (that is, all records have a score of 0), then none will be selected. An exceptional case is when you use Record Selection as a Range and provide an out-of-bounds value, which may result in no unique record being selected. When the profile is executed, the focus is shifted to the Statistics tab from where you can view the graphical representation of the pass/fail data. On the Results Summary pane, click on the required field name: • ***Counts - Total*** - Click this to view the total pass and fail data in the pie chart representation. You can click on parts of the graph to view the data in the Drill Down Data tab. • **Rule - BarCharts** - Click this to view all the pass and fail rule in the bar chart representation. You can click on parts of the graph to view the data in the Drill Down Data tab. • -MultipleFields- - Click this to view the following information about the Remove Duplicates rule: – Keys - The Key Fields. – Sort Keys - The keys which sort the records within each cluster. – Number of Unique Record Rules - The number of Unique Record Rules defined. – Cluster File - The path and name of the cluster file. – Number of clusters - The number of duplicate clusters (groups) that were found. – Number of duplicates found - Duplicate row count that was found in the source data. – Number of Unique records - The number of unique records that were found. – Browse Clusters File - Click this to open the clusters file in the Drill Down Data tab. See Also: Viewing Pass, Fail, and Drill Down Output. | |
Cluster Data tab - On the Rules tab, select the Remove Duplicates rule and click from the Rule Definition pane to generate clusters and browse the cluster file in the Cluster Data tab. The following additional fields are displayed: • Select - This checkbox allows you to select records manually. By selecting the desired records and clicking the Generate Rules button, you will create Unique Record Rules. This feature is particularly helpful if you have few duplicates in your source data or prefer to have Data Profiler generate the rules for you. • $$ClusterId - The unique ID that is assigned to each cluster. Using this, you can identify and understand how many clusters exist and how many records are in each cluster. The cluster IDs are not retained, so every time the profile is executed or the cluster file is generated, the order of the clusters and the associated IDs will change. • $$pass - $$Pass gives you information about pass records, if it is blank then the record passed the rule, if marked as “X” then the record failed the rule. • $$rule_id - This is an ID allocated to each Unique Record Rule. The rule_id is only displayed for the unique records identified from the Unique Record Rules. Note: The Generate Clusters icon () is available only for Duplicate Rules. | |
Supported Data Types | Any data type. |
Remarks | • Provides pass and fail statistics and generates a cluster file. Duplicate records are sent to the fail target and unique records are written to the pass target. • Only one Remove Duplicates Fuzzy Matching rule is permitted within a profile. However, multiple fields can be designated as key fields. Once the Remove Duplicates Fuzzy Matching rule has been added to a profile, it will be categorized under MultipleFields on the Rules tab, instead of by the name of the key field. • Supported Fuzzy Matching algorithms: – CONTAINS: Checks the small string in the long string. – DAMERAU_LEVENSHTEIN: This distance between two strings is the minimum number of operations (insertions, deletions or substitutions of a single character, or transposition of two adjacent characters) required to change one string into another. For more information, see Damerau–Levenshtein distance. – EXACT_MATCH: Checks two strings for exact match. – JARO: A measure of characters in common, being no more than half the length of the longer string in distance with consideration for transpositions. – JARO_WINKLER: This algorithm extends Jaro algorithm and uses a prefix scale “p” which gives more favorable ratings to strings that match from the beginning for a set prefix length “l”. The prefix length is set to 4 characters (the maximum possible prefix length). For more information, see Jaro–Winkler distance. – LEVENSHTEIN: This distance between two strings is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one string into other. For more information, see Levenshtein distance. – POSITIONAL_QGRAM: Positional q-grams is an extension to q-grams with an additional property called maxDistance used in calculating the string similarity. For example, ‘peter’ contains the positional bigrams (‘pe’,0), (‘et’,1), (‘te’,2) and (‘er’,3). If maxDistance is set to 1, then (‘et’,1) will only be matched to bi-grams in the other string with positions 0 to 2 – QGRAM: q-grams (also called as n-grams) is the process of breaking up a string into multiple strings each with length n (q). For example, apply q-grams on abcde with q = 3 would yield abc, bcd, cde. For example, the word ‘peter’ with q=2 would yield following bigrams - ‘pe’, ‘et’, ‘te’ and ‘er’. A q-gram similarity measure between two strings is calculated by counting the number of q-grams in common. – SHORTHAND: Determine if the shorter string is a “shorthand” of the longer. That is, if it can be produced only by deletions. If the first character of a word in the longer string is deleted, the entire word is considered deleted. For e.g. IBM is a shorthand of International Business Machines. |
Description | Removes specified characters or words from a string field. |
Parameters | Note: A default Rule Name is provided, however, you can edit or overwrite it. Select the character classes to remove - Tells what to remove from the source field value. You can choose one or multiple options. The options are: • Whitespace - Removes spaces. • Digits - Removes numeric characters. • Non Printable Characters - Removes non printable characters. Non printable characters are symbols that do not visibly appear when printed or displayed. Examples include newline (\n) for line breaks and tab (\t) for spacing. • Special Characters - Removes any special characters. Special characters are the punctuation characters on your keyboard, such as: ! @ # $ % ^ & * ( ) - _ = + \ | [ ] { } ; : / ? . > Remove Other - Choose from one of the following options: • <None> - Choose this when you want to skip specifying the Remove Other option. • Literal - Allows you to specify literal values to search for and remove. • Regular Expressions - Allows you to specify your own regular expression (regex pattern) to search for and remove. See Remarks. • Words - Allows you to specify a comma separated list of word numbers or a range of word numbers to remove. For example “1, 2-4,” will remove 1st, 2nd through 4th word. |
Supported Data Types | • String (input) • String (output) |
Remarks |
Description | Replaces a field value with a new value. |
Parameters | Note: A default Rule Name is provided, however, you can edit or overwrite it. Replace Function Choices - Choose from one of the following options: • Replace with Constant - Replace the field value with the specified Constant. – Constant - In this text box, type the constant value with which to replace. • Replace with Field - Replace the field value with the specified alternate source Field value. – Field - Select the field from which to replace. • Replace with Function Results (available for string fields) - Replace the field value with the selected function results. You can select one or both functions: – ChangeStringCase - Change the case of a string field to specified case (for example upper, lower, title) before writing to target field. See Change String Case. – StringTrim - Trims whitespace from a String field. See String Trim. • Replace with RegX (available for string fields) - Replace what matches the specified regular expression with the replacement value, before writing to target field: – Pattern - The regex pattern to search for. See Java Regular Expressions. – Replacement - The replacement value. – Occurrences - The field occurrences that you want to replace. For example “1, 3-4, 7”. The “3-4” (from 3rd through 4th) is a range. You can also specify “all”. If left blank, all instances are replaced. Replace with Condition - Allows you to replace source field values based on a condition (if the specified condition is true): • CompareToConstant - Replace if the field value matches the specified Operator and Constant: – Operator - The comparison operator (Equal, Greater, Greater or Equal, Lesser, Lesser or Equal, Not Equal) – Constant - In this text box, type the constant value to compare, and then press the Enter key or click Add. For equal or not equal you can specify one or more constants. The constant must be of the same data type as the profiled field. The equal operator is not recommended to use on double or float field comparisons. Do not use quotation marks with string data. • CompareToField - Replace if the field value matches the specified Operator and Compare Field: – Operator - The comparison operator (Equal, Greater, Greater or Equal, Lesser, Lesser or Equal, Not Equal) – Compare Field - Select the required field to compare from. |
• FuzzyMatch (available for string fields) - Replace if the field value fuzzy matches the specified Constant. Duplicate records are identified by using the using the specified Fuzzy Match Algorithm (fuzzy matching rule) and Fuzzy Score Filter (match score). – Fuzzy Match Algorithm - Supported Fuzzy Matching algorithms (CONTAINS, DAMERAU_LEVENSHTEIN, EXACT_MATCH, JARO, JARO_WINKLER, LEVENSHTEIN, QGRAM, POSITIONAL_QGRAM, SHORTHAND) – Constant - In this text box, type the constant value to compare, and then press the Enter key or click Add. You can specify one or more constants. – Fuzzy Score Filter - Select a decimal values between 0.01 to 1. Comparison score is between 0 to 1. Records with comparison score less than this value are not considered a match for probable duplicates and are discarded. The higher the value you select, more strict is the matching. Default is 0.7. • IsBlank (available for string fields) - Replace if the field value is blank. • InRange (available for numeric fields) - Replace if the value is within specified range. See In Range. • IsNull (available for numeric fields) - Replace if the field value is null. • MatchesRegex (available for string fields) - Replace if the field value matches the specified regular expression (regex pattern). See Java Regular Expressions. | |
Supported Data Types | Any data type |
Remarks | When replacing values in the target field, you may encounter a truncation error, often due to a mismatch in field sizes between the target field and the replacement value. To resolve this issue, adjust the field size of target field according to the replaced/overwritten value size. |
Description | Trims whitespace from a String field. |
Parameters | Note: A default Rule Name is provided, however, you can edit or overwrite it. Trim Side - The side from which to remove white spaces: • Both (default) • Left • Right |
Supported Data Types | • String (input) • String (output) |
Remarks | Examples: To convert a string field to another data type and to ignore extra whitespace: Apply the String Trim rule to a string field, and then apply String To Conversion rule to the output. Without String Trim, extra white space in the field value causes the conversion rule to fail. To determine distinct values in a string field and ignore whitespace: Apply the String Trim rule to a string field, then apply the Distinct Values rule to the output. Without String Trim, varying amounts of extra whitespace are considered distinct values. |
Description | Applies the specified time zone to a timestamp field. |
Parameters | Note: A default Rule Name is provided, however, you can edit or overwrite it. Time Zone - Select from a list of available time zones. |
Supported Data Types | Timestamp |
Remarks | Example: Consider a value like 2021-12-01 20:30:00 When parsed, this value will be assumed to be in the system’s default timezone. To specify a different time zone, you can use the Apply Time Zone remediation rule and specify a time zone parameter, which will be applied to the timestamp field. |