DC 12.4 | Data Prep Rules

User Guide > Designing and Executing Data Profile > Rule and Parameter Reference > Data Prep Rules

Was this helpful?

Data Prep Rules

Data preparation is the process of converting raw data into a clean, consistent, and structured format that is suitable for analysis. This process involves several key steps, including data cleaning to remove errors and inconsistencies, as well as data transformation. These rules create new in-memory fields (see Derived Fields), which can then be used for further processing or to build additional rules. It also involves data engineering tasks such as adding new columns to enrich the dataset, deleting unnecessary columns to improve efficiency, and renaming columns for improved clarity and consistency.

They generate pass and fail statistics based on conversion success or failure, and they create derived fields of the converted type.

See Add New Rule for information about adding rules.

The following is a list of data preparation rules:

Rule Name	Description
ExecuteExpression	Executes the specified expression on the field. This is useful for complex transformations.
LookupValue	Looks up a value from a dataset, based on the specified keys.
ParseExtract	Extracts one or more parts from fields such as names, addresses, and timestamps.
PhoneticStringEncode	Phonetically encodes the string field using the specified algorithm.
StringJoin	Joins multiple fields into a single string field which can be used to create a new column.
StringSplit	Separates a string field into substrings which can be used to create new columns.
StringToConversion	Converts a given string field into a specified data type. It creates a new in-memory field.

ExecuteExpression

This function rule executes the specified expression. This is useful for complex transformations.

Once the Execute Expression rule has been added to a profile, it will be categorized under -MultiFieldRules- on the Rules tab, instead of under the -OutputFields-.

Rule Properties

Rule Name	A default rule name (ExecuteExpression or ExecuteExpression_n, where “n” is 1,2,3, and so on) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field.
Field Name	-MultipleFields- is displayed here.
Rule Type	The type of rule that is applied to the field. That is ExecuteExpression (Function rule). Tip... A Red Cross in a rule icon (for example ) indicates that the rule has a parameter that has not been defined.

Rule Parameters

Script

The Script text box helps you construct expressions for Derive Fields.

To add multiple regular expressions use the matchesPattern string function:

matchesPatterns('FieldName', "patternA", "patternB", "patternC")

where FieldName is the field name and patternABC are regular expressions.

For example: matchesPatterns('Account Number', "01-.*", "02-.*", "03-.*")

Click Build to open the Expression Builder dialog to assist you in building the script. See Using the Expression Builder.

Derived Field Name

A default derived field name (d_ExecuteExpression, d_ExecuteExpression_n, where “n” is 1,2,3, and so on) is provided and displayed here. However, you can edit or overwrite it. See Derived Fields.

Supported Data Types

String, Date, Time, Timestamp, Boolean, Numeric, Double, Float, Long, Integer

Using the Expression Builder

The Expression Builder dialog helps you construct expressions for Derive Fields. This dialog provides a list of available fields, operators, and functions you can use to build expressions.

To open the Expression Builder dialog, click Build. In the displayed dialog, select the required field and double-click.

The following is an Expression Builder dialog:

The following elements are available in the dialog:

Screen Element	Description
(1) Script editor	The script editor displays the expression that is built. You can insert the elements from the element tree or type in the editor.
(2) Search	Enter a keyword in the Search box to find and display matching fields or functions in the Search Results pane. Click on the required function and the description for the function is displayed in the Details pane. To insert the selected function into the expression, double-click the function. Tip... When you are typing a search keyword, a appears at then end of the search box, which can be clicked to clear the box.
(3) Element tree	The left pane provides the Field references, Functions, and Conditional constructs that can be used to build an expression. Click on the required element and the description is displayed in the Details pane. To insert an element, double-click the element or select the element and click OK. The following elements are listed in the form of a tree structure: • Field references: Provides a list of input fields that can be referenced in the expression. • Functions: Provides a list of functions available to the expression language, and it is organized by category. To see the description of a function, select it. After a function is inserted into the expression, you must replace its argument names with appropriate values. See Conversion Functions, Date And Time Functions, Math Functions, String Functions. • Conditional constructs: Conditional constructs are programming structures that allow you to control the flow of a program by making decisions based on conditions. See Conditional Constructs.
(4) Details pane	Displays the details of the selected function.
(5) Operators	Click to view a list of operators supported by expression language. The operators are organized by category. See Operators.
(6) Validation button	Click to check the validity of the expression and report any existing errors.

Operators

There are several types of operators in that you can use in your expressions or scripts, including:

• Arithmetic: Used to perform basic mathematical operations

• Comparison or Relational: Used to compare two values or expressions. They return a boolean result (True or False) based on whether the comparison is true or false.

• Logical: Used to combine or modify boolean values (True or False), assignment, and more.

The following table lists the available operators:

Operator	Description
+	Arithmetic operator for addition. Adds two values together.
-	Arithmetic operator for subtraction. Subtracts one value from another.
/	Arithmetic operator for division. Divides one value by another.
*	Arithmetic operator for multiplication. Multiplies two values together.
=	Relational “equal to” operator. Checks if two values or expressions are equal.
!=	Relational “not equal to” operator. Checks if two values or expressions are not equal.
>	Relational “greater than” operator. Checks if the value on the left is greater than the value on the right.
>=	Relational “greater than or equal to” operator. Checks if the value on the left is greater than or equal to the value on the right.
<	Relational “less than” operator. Checks if the value on the left is less than the value on the right.
<=	Relational “less than or equal to” operator. Checks if the value on the left is less than or equal to the value on the right.
(	Open bracket. Open and closed brackets are used to group expressions. They are not specific operators but are frequently used in conjunction with conditional operators or mathematical expressions. Example: a + (b * c)
)	Close bracket. Open and closed brackets are used to group expressions. They are not specific operators but are frequently used in conjunction with conditional operators or mathematical expressions. Example: ((a > b) and (c < d))
like	Logical “like” operator. Compares two string expressions. If the expressions match, result is True. If there is no match, result is False.
and	Logical “and” operator. Returns True if both operands (conditions) are True. Otherwise, it returns False.
or	Logical “or” operator. Returns True if at least one of the operands (conditions) is True. Returns False if both operands are False.
is null	Test “is null”. That is checks if a field value or result of expression is null.
is not null	Test “is not null”. That is checks if a field value or result of expression is any value other than null.
is true	Test “is true”. That is checks if a field value or result of expression is true, meaning it's the value true.
is not true	Test “is not true”. That is checks if a field value or result of expression is not true, meaning it's any value other than true (including false, non-zero numbers, non-empty strings, etc.).
is false	Test “is false”. That is checks if a field value or result of expression is false, meaning it's the value false.
is not false	Test “is not false”. That is checks if a field value or result of expression is not false, meaning it's any value other than false (including true, non-zero numbers, non-empty strings, etc.).

Conversion Functions

The conversions category provides a set of functions for converting the types of input data values.

Function	Description	Parameters
asBoolean(field, truth, falsity)	This function converts an integer field into a boolean type based on specific integer values representing true and false. If the integer value of the field matches the truth value, it will return true. If it matches the falsity value, it will return false. Any other value (unmapped values) will result in null. This function can be useful when working with datasets or database records where the boolean values are stored as integers (e.g., 1 for true and 0 for false), and you want to map these values to actual boolean values in your application.	• field: Name of the field to be converted. The field name should be wrapped in back quotes (e.g., `myField`). • truth: An integer value representing what should be interpreted as true. • falsity: An integer value representing what should be interpreted as false.
asDate(field)	This function converts a timestamp field into a date field by removing the time portion, leaving only the date. Essentially, it performs a downcast operation from a timestamp type to a date type. This function can be useful when you only need the date portion for comparison, grouping, or filtering, and you want to disregard the time of day.	• field: Name of the field to be converted. The field name should be wrapped in back quotes (e.g., `myField`).
asDouble(field)	This function converts a numeric field (such as an integer, float, or decimal) into a double type. This conversion may result in a loss of precision, especially if the original number has more precision than a double can represent. This function can be useful in scenarios where you're processing numeric fields that could be in different formats (e.g., integers, floats) and need them to be consistently represented as double values for further computation or storage.	• field: Name of the field to be converted. The field name should be wrapped in back quotes (e.g., `myField`).
asFloat(field)	This function converts a numeric field (such as an integer, decimal, or double) into a float type. This conversion can lead to a loss of precision if the number is too large or too small to be represented as a float. This function can be useful in scenarios where you're processing numeric fields that could be in different formats and need them to be consistently represented as float values for further computation or storage.	• field: Name of the field to be converted. The field name should be wrapped in back quotes (e.g., `myField`).
asInt(field)	This function converts a numeric field (such as an float, decimal, or double) into an integer type. This conversion can lead to a loss of precision if the number is too large or too small to be represented as an integer. This function can be useful in scenarios where you're processing numeric fields that could be in different formats and need them to be consistently represented as integer values for further computation or storage.	• field: Name of the field to be converted. The field name should be wrapped in back quotes (e.g., `myField`).
asLong(field)	This function converts a numeric field (such as an float, decimal, or double) into a long type. This conversion can lead to a loss of precision if the number is too large or too small to be represented as a long. This function can be useful in scenarios where you're processing numeric fields that could be in different formats and need them to be consistently represented as long values for further computation or storage.	• field: Name of the field to be converted. The field name should be wrapped in back quotes (e.g., `myField`).
asNumeric(field)	This function converts a field with a numeric type into a numeric type, typically backed by a BigDecimal object. It is designed to handle fields that represent numeric values and standardize them into a type that can handle precise decimal values. The primary purpose of this method is to ensure that a field containing numeric data is transformed into a more precise numeric type, ensuring accuracy in calculations, especially when dealing with decimals. By using BigDecimal, the method helps avoid issues like rounding errors or precision loss that can occur with other numeric types like double or float.	• field: Name of the field to be converted. The field name should be wrapped in back quotes (e.g., `myField`).
asTime(field)	This function converts a timestamp field into a time field by removing the date portion. It retains only the time part of the timestamp. The conversion also takes into account the timezone offset associated with the timestamp. Essentially, it performs a downcast operation from a timestamp type to a time type. This function is useful when you want to focus on the time of an event (without worrying about the specific date), such as when analyzing daily routines within a specific time window.	• field: Name of the field to be converted. The field name should be wrapped in back quotes (e.g., `myField`).
toBoolean(field, truth, falsity)	This function converts a string field into a boolean value based on specified truth and falsity values. It checks the content of the string (field) and compares it with the given truth and falsity values. If the string matches the truth value, it returns true. If it matches the falsity value, it returns false. Returned value: A boolean value (true or false), depending on whether the field matches the truth or falsity value.	• field: Name of the field to be converted. The field name should be wrapped in back quotes (e.g., `myField`). • truth: The string that represents a true value (e.g., "yes", "1", "true"). • falsity: The string that represents a false value (e.g., "no", "0", "false").
toDate(field, format)	This function converts a field of type string into a date type using a specified format (e.g., "YYYY-MM-DD", "MM/DD/YYYY", etc.) The string value must match the pattern specified in the format parameter for a successful conversion. This function can be useful when the data you are working with is in a string format, but you need to perform operations (such as comparisons or date arithmetic) that require the data to be treated as an actual Date.	• field: Name of the field to be converted. The field name should be wrapped in back quotes (e.g., `myField`). • format: (optional) The format specification that defines how the string should be interpreted and converted to a date. (e.g., yyyy-MM-dd for "2025-02-10").
toDouble(field, format)	This function converts a field of type string into a double type, using the specified format. This function can be useful when working with data stored as numbers as strings in various formats that need to be processed as double for calculations, comparisons, or other operations.	• field: Name of the field to be converted. The field name should be wrapped in back quotes (e.g., `myField`). • format: (optional) The format specification that defines how the string should be interpreted and converted to a double. (e.g., "#,###.##" for "2,102.52").
toFloat(field, format)	This function converts a field of type string into a float type, using a specified format. This function can be useful when working with data stored as numbers as strings in various formats that need to be processed as float for calculations, comparisons, or other operations.	• field: Name of the field to be converted. The field name should be wrapped in back quotes (e.g., `myField`). • format: (optional) The format specification that defines how the string should be interpreted and converted to a float. For example, #,###.## means that the number is formatted with commas (e.g., "1,234.56"), which will be converted to a float 1234.56.
toInt(field, format)	This function converts a field of type string into an int type, using a specified format. This function can be useful when working with data stored as numbers as strings in various formats that need to be processed as int for calculations, comparisons, or other operations.	• field: Name of the field to be converted. The field name should be wrapped in back quotes (e.g., `myField`). • format: (optional) The format specification that defines how the string should be interpreted and converted to an int. For example, #,### means that the number is formatted with commas (e.g., "1,234"), which will be converted to an integer 1234.
toLong(field, format)	This function converts a field of type string into a long type, using a specified format. This function can be useful when working with numerical data stored as strings that represent large numbers (larger than int values). By applying the format, the function correctly interprets the string and converts it into a long type, making it suitable for further numerical operations.	• field: Name of the field to be converted. The field name should be wrapped in back quotes (e.g., `myField`). • format: (optional) The format specification that defines how the string should be interpreted and converted to a long. For example, #,###,### means that the number is formatted with commas (e.g., "1,000,000"), which will be converted to an integer 1000000.
toNumeric(field, format)	This function converts a field of type string into a numeric type (which could be either an integer, float, or any numeric type) using a specified format. This function provides a more general conversion mechanism, allowing a string to be converted to a numeric type without specifying whether it should be an integer, float, or long. It adapts to various numeric formats based on the provided format specification.	• field: Name of the field to be converted. The field name should be wrapped in back quotes (e.g., `myField`). • format: (optional) The format specification that defines how the string should be interpreted and converted to a number.
toString(field)	This function converts a scalar type field (such as an integer, date, or any other scalar data type) into a string value. This method applies the default formatting based on the data type of the field, and it essentially helps in converting data types to string representations.	• field: Name of the field to be converted. The field name should be wrapped in back quotes (e.g., `myField`).
toText(value, format)	This function converts the value of an expression into a text type, using a specified format. The use of format allows for more control over how the data is converted into text, as opposed to toString(), which applies default formatting. You can pass different formats for dates, numbers, etc. to customize the conversion.	• value: Expression or a string field wrapped in back quotes (e.g., `myField`). • format: (optional) The format specification that defines how the string should be interpreted and converted.
toTime(field, format)	This function converts a field of type string into a time type using a specified format. This allows you to parse a string that represents a time value and convert it into an actual time datatype using the given format.	• field: Name of the field to be converted. The field name should be wrapped in back quotes (e.g., `myField`). • format: (optional) The format specification that defines how the string should be interpreted and converted to a time. For example, "HH:mm:ss" for hours, minutes, and seconds (24-hour format). "hh:mm a" for hours, minutes, and AM/PM (12-hour format). "mm:ss" for minutes and seconds.
toTimestamp(field, format)	This function converts a field of type string into a timestamp type using a specified format. This allows you to parse a string that represents a timestamp value and convert it into an actual timestamp datatype using the given format.	• field: Name of the field to be converted. The field name should be wrapped in back quotes (e.g., `myField`). • format: (optional) The format specification that defines how the string should be interpreted and converted to a timestamp. If none is provided, the default ISO 8601 format is used For example, the format could include the date and time with a pattern like "yyyy-MM-dd HH:mm:ss".

Date And Time Functions

The date and time category provides functions for dealing with date, time, and timestamp values. Unless specifically indicated otherwise, these functions accept values of any of the date and time types. If no time zone is supplied to functions taking an optional time zone argument, the default time zone as indicated by the JVM will be used.

Function	Description	Parameters
addTime(datetimeField, deltaField, granularity, timeZone)	This function adds a time period (like hours, days, or minutes) to a date or time value. It returns a new date or time value (same type as the initial value) with the added period. Returned value: Returns a date or time value (same type as the initial value). Restrictions: Computation on time stamp values is performed relative to the specified time zone. Date and time of day values are processed independent of time zone. You cannot add smaller time units (like hours or minutes) to a date value (e.g., "2025-02-07"), and you cannot add larger units (like days or months) to a time of day value (e.g., "14:30"). Example: Let’s say you want to add 3 hours to the date value in field, `myField`: addTime(`myField`, 3, "HOUR", "America/Los_Angeles")	• datetimeField: This is the date/time field wrapped in back quotes (e.g., `myField`). For example, it could be a date like "2025-02-07" or a time like "14:30". • deltaField: A constant or a field. This tells us how much time to add. For example, it could be "2" or "5". • granularity: This is the unit in which the deltaField value is measured. This is specified as an enum constant: CENTURY, DAY, HOUR, MILLISECOND, MINUTE, MONTH, SECOND, WEEK, YEAR. • timeZone: (optional) This is a java.util.TimeZone string (e.g., "America/Los_Angeles"). This is the time zone where the calculation should happen.If you don't provide a time zone, your computer's local time zone will be used.
currentDate(timeZone)	This function gets the current date based on the time zone you specify. If no time zone is provided, it uses your local time zone. Returned value: Returns a date value. Example: The following function will return the current date according to the New York time zone, not your local time: currentDate("America/New_York")	timeZone: (optional) This is a java.util.TimeZone string (e.g., "America/Los_Angeles"). This is the time zone you want to use to get the current date. If you don't provide a time zone, your computer's local time zone will be used.
currentTime(timeZone)	This function gets the current time of day based on the time zone you specify. If no time zone is provided, it uses your computer's local time zone. Returned value: Returns a time value. Example: The following function will return the current time according to the New York time zone, not your local time: currentTime("America/New_York")	timeZone: (optional) This is a java.util.TimeZone string (e.g., "America/Los_Angeles"). This is the time zone you want to use to get the current time. If you don't provide a time zone, your computer's local time zone will be used.
currentTimestamp	This function gets the current date and timestamp based on the time zone you specify. If no time zone is provided, it uses your computer's local time zone. Returned value: Returns a timestamp value. Example: The following function will return the current timestamp according to the New York time zone, not your local time: currentTimestamp("America/New_York")	timeZone: (optional) This is a java.util.TimeZone string (e.g., "America/Los_Angeles"). This is the time zone you want to use to get the current timestamp. If you don't provide a time zone, your computer's local time zone will be used.
dateTimeValue(field, dateTimeValue, timeZone)	This function is used to extract a specific date or time value from a specified field, adjusting the interpretation of timestamp values according to the time zone that you specify. If no time zone is provided, it uses your computer's local time zone. You cannot extract time from a date value or date from a time of day value. An error will be received. Returned value: Returns a date or time value depending on the specified dateTimeValue parameter. Example: Imagine you have a record with the field `myField`: 2025-02-07T15:30:00Z (a timestamp in UTC). You want to extract the hour and interpret it in the "America/New_York" time zone. dateTimeValue(`myField`, "HOUR_OF_DAY", "America/New_York") The result would be the hour (10) from "2025-02-07T15:30:00Z" in the New York time zone.	• field: Name of the field containing a date, time of day, or time stamp value. The field name should be wrapped in back quotes (e.g., `myField`). • dateTimeValue: The date or time value to extract. This is specified as an enum constant: DAY_OF_MONTH, DAY_OF_WEEK, DAY_OF_YEAR, HOUR_OF_DAY, MINUTE_OF_HOUR, MONTH, SECOND_OF_MINUTE, WEEK_OF_YEAR, YEAR. • timeZone: (optional) This is a java.util.TimeZone string (e.g., "America/Los_Angeles"). If you don't provide a time zone, your computer's local time zone will be used.
dateTimeTextValue(field, dateTimeValue, timeZone)	This function is used to extract a specific date or time value as text from a specified field, adjusting the interpretation of timestamp values according to the time zone that you specify. If no time zone is provided, it uses your computer's local time zone. You cannot extract time from a date value or date from a time of day value. An error will be received. Returned value: Returns a date or time value as a text, depending on the specified dateTimeValue parameter. Example: Imagine you have a record with the field `myField`: 2025-02-07T15:30:00Z (a timestamp in UTC). You want to extract the hour and interpret it in the "America/New_York" time zone. dateTimeTextValue(`myField`, "HOUR_OF_DAY", "America/New_York") The result would be the hour (10) from "2025-02-07T15:30:00Z" in the New York time zone, as a human readable text.	• field: Name of the field containing a date, time of day, or time stamp value. The field name should be wrapped in back quotes (e.g., `myField`). • dateTimeValue: The date or time value to extract. This is specified as an enum constant: DAY_OF_MONTH, DAY_OF_WEEK, DAY_OF_YEAR, HOUR_OF_DAY, MINUTE_OF_HOUR, MONTH, SECOND_OF_MINUTE, WEEK_OF_YEAR, YEAR. • timeZone: (optional) This is a java.util.TimeZone string (e.g., "America/Los_Angeles"). If you don't provide a time zone, your computer's local time zone will be used.
subtractTime(field, deltaField, granularity, timeZone)	This function is used to subtract a time period from a date/time value from a specified field, adjusting the interpretation of timestamp values according to the time zone that you specify. If no time zone is provided, it uses your computer's local time zone. You cannot subtract time from a date value or date from a time of day value. An error will be received. Returned value: Returns a date or time value. Example: Let's say we have a timestamp value in a field called orderDate (the date/time of an order being placed), and we want to subtract 5 days from it. The orderDate field is of type timestamp, and we want to subtract this time period considering a particular timezone (e.g., "America/New_York"). subtractTime(`orderDate`, 5, "DAY", "America/New_York") If the orderDate is "2025-02-07T10:00:00Z", subtracting 5 days would result in "2025-02-02T10:00:00Z" considering the time zone adjustments.	• field: Name of the field containing a date, time of day, or time stamp value. The field name should be wrapped in back quotes (e.g., `myField`). • deltaField: A constant or a field. This tells us how much time to subtract. For example, it could be "2" or "5". • granularity: This is the unit in which the deltaField value is measured. This is specified as an enum constant: CENTURY, DAY, HOUR, MILLISECOND, MINUTE, MONTH, SECOND, WEEK, YEAR. • timeZone: (optional) This is a java.util.TimeZone string (e.g., "America/Los_Angeles"). If you don't provide a time zone, your computer's local time zone will be used.
timeDiff(startDtField, endDtField, granularity, scale)	This function calculates the time difference between two date/time fields, with options for customizing the unit of measurement and precision. Both values must be of the same type. When comparing time fields the precision is millisecond based whereas when comparing date fields the precision is day based. This means that when using date fields the difference will be based on calendar intervals versus time fields which will use the more precise astronomical time intervals. Returned value: Returns the time difference in unit specified by granularity. Example: timeDiff(`startDate`, `endDate`, "DAY", 1) In the above example, the difference is measured in days. The scale is set to 1, meaning you want the result to be precise to whole days. If startDate is "2025-02-01" and endDate is "2025-02-07", the function would return 6 days as the difference.	• startDtField: The field that represents the starting point of the interval. The field name should be wrapped in back quotes (e.g., `myField`). • endDtField: The field that represents the ending point of the interval. The field name should be wrapped in back quotes (e.g., `myField`). • granularity: The time unit in which to measure the interval length. This is specified as an enum constant: CENTURY, DAY, HOUR, MILLISECOND, MINUTE, MONTH, SECOND, WEEK, YEAR. • scale: (optional) This adjusts the precision or "scale" of the time difference. For example, you might want the difference to be calculated with more or fewer decimal places.
toMilliseconds(field, timeZone)	This function converts the value of a date time field into the number of milliseconds that have passed since the "epoch", which is January 1, 1970, at 00:00:00 GMT. If the input field is null, the function will return a null value. As time of day values have no date portion, this will return the milliseconds since midnight. For date values, the result represents midnight of the associated day. Returned value: Returns the time in milliseconds. Example: toMilliseconds(`myDate`) In the above example, if myDate is "2025-02-10 12:00:00", the function will calculate how many milliseconds have passed between January 1, 1970 and 2025-02-10 12:00:00 GMT.	• field: Name of the field containing a date, time of day, or time stamp value. The field name should be wrapped in back quotes (e.g., `myField`). • timeZone: (optional) This is a java.util.TimeZone string (e.g., "America/Los_Angeles"). If you don't provide a time zone, your computer's local time zone will be used.
useDefaultTimeZone( field)	Adjusts the time zone of a timestamp value to match the default time zone without changing the actual point in time the timestamp represents. The key thing to note is that it doesn't change the actual point in time. Rather, it changes the time zone offset associated with that timestamp. Returned value: Returns the timestamp adjusted to the local machine’s time zone, without altering the actual time value itself.	• field: Name of the field containing a date, time of day, or time stamp value. The field name should be wrapped in back quotes (e.g., `myField`).
useTimeZone(field, timeZone)	This function adjusts the time zone of a timestamp value to match the specified time zone without changing the actual point in time the timestamp represents. The key thing to note is that it doesn't change the actual point in time. Rather, it changes the time zone offset associated with that timestamp. Returned value: Returns the timestamp adjusted to the specified time zone, without altering the actual time value itself.	• field: Name of the field containing a date, time of day, or time stamp value. The field name should be wrapped in back quotes (e.g., `myField`). • timeZone: (optional) This is a java.util.TimeZone string (e.g., "America/Los_Angeles"). If you don't provide a time zone, your computer's local time zone will be used.

Math Functions

The Math category provides implementations of common math functions. Trigonometric functions use radians by default.

Function	Description	Parameters
abs(field)	This function returns the absolute value of the given numeric field, which is simply the number without any negative sign. The number can be an int, long, float, or a double). If the number is already positive, it stays the same. If the number is negative, it changes to positive. If the argument is NaN, the result is NaN.	field: A numeric field. The field should be enclosed in back quotes (e.g., `myField`). Alternatively, an expression (such as another function or an expression that evaluates to a number) can also be used.
exp(field)	This function returns e raised to power of the given field. Example: Let's say you have a field called num in a record, and it contains a number “1”. exp(`num`) will evaluate to exp(1) returning e raised to the power of 1, which is just e (about 2.718). exp(2) would give e raised to the power of 2 (about 7.389). If the argument is NaN, the result is NaN.	field: A numeric field. The field should be enclosed in back quotes (e.g., `myField`). Alternatively, an expression (such as another function or an expression that evaluates to a number) can also be used.
log(field)	This function returns natural logarithm of the given field. Example: Let's say you have a field called num in a record, and it contains “7.389”. log(`num`) will evaluate to log(7.389) returning 2, because e raised to the power of 2 is about 7.389. If the argument is NaN or less than zero, the result is NaN.	field: A numeric field. The field should be enclosed in back quotes (e.g., `myField`). Alternatively, an expression (such as another function or an expression that evaluates to a number) can also be used.
log10(field)	This function returns base 10 logarithm of the given field. This means it tells you what power you need to raise 10 to in order to get the field number. Example: Let's say you have a field called num in a record, and it contains “1000”. log10(`num`) will evaluate to log10(1000) returning 3, because 10 raised to the power of 3 equals 1000. If the argument is NaN or less than zero, the result is NaN.	field: A numeric field. The field should be enclosed in back quotes (e.g., `myField`). Alternatively, an expression (such as another function or an expression that evaluates to a number) can also be used.
random()	This function returns a random value greater than or equal to 0 and less than 1. It is generated using a pseudorandom-number generator, meaning the values aren't completely random but are generated based on an algorithm.	None
round(field)	This function returns field rounded to the nearest whole number. Examples: round(4.3) will return 4 round(4.5) will return 5 If the argument is NaN, the result is 0. Returned value: Long if the argument is a Double. Integer if the argument is a Float.	field: A numeric field. The field should be enclosed in back quotes (e.g., `myField`). Alternatively, an expression (such as another function or an expression that evaluates to a number) can also be used.
scale(field)	This function returns scale of the field. Scale measures the precision after the decimal point. Example: If you have a number like 123.456, the scale of this number is 3 because there are 3 digits after the decimal point. If you have a number 45.6789, the scale is 4 because there are four digits after the decimal. Returned value: Integer	field: A numeric field. The field should be enclosed in back quotes (e.g., `myField`). Alternatively, an expression (such as another function or an expression that evaluates to a number) can also be used.
sin(field)	This function returns sine of the given field using radians. Sine is a mathematical function that takes an angle (in radians) and returns the sine value of that angle. Radians are a way of measuring angles, similar to degrees, but using a different scale. Suppose we have an angle of 30 degrees. To use the sin function, you first need to convert 30 degrees into radians, then find the sine value. Example: Let’s say the field is a number representing an angle in radians, like 1.0 (which is approximately 57.3 degrees). The sin function would return the sine of 1.0, which is approximately 0.841.	field: A numeric field. The field should be enclosed in back quotes (e.g., `myField`). Alternatively, an expression (such as another function or an expression that evaluates to a number) can also be used.

String Functions

The String category provides common String manipulation functions.

Function	Description	Parameters
concat(field1, field2, ...)	This function combines multiple values or results of multiple expressions into one string. It automatically converts non-string values to strings. By default, it ignores any null values, but you can change this behavior if needed. Optionally, you can specify how null values should be handled. Example: If you use concat("Hello", 123, null, "world"), the result will be: "Hello123world" This is because, "Hello" and "world" are strings, so they stay as they are. 123 is a number, so it gets converted to "123". null is ignored by default. If you set NullConcatenation to treat null, you could get a different result: For example, concat(NullConcatenation.NULL_STRING, "Value: ", null) would return "Value: NULL" NullConcatenation can be set to different options like: • NullConcatenation.EMPTY_STRING (default, treating null as an empty string) • NullConcatenation.NULL_STRING (treating null as "NULL") • NullConcatenation.SPECIAL_FORMAT (defining a custom string format) The exact implementation might depend on the platform you're using. Returned value: String	field: A string constant or a string field. The string field should be enclosed in back quotes (e.g., `myField`). Alternatively, an expression (such as another function or an expression that evaluates to a string) can also be used.
find(field, pattern)	This function searches for a pattern in a given string. It returns the position of the first match of the pattern in the string. If no match is found, it returns -1. Example: If you use find("Hello, world!", "world"), the result will be: 7. This is because the substring "world" starts at position 7 in the string "Hello, world!". If the pattern is not found, like in this example: field("Hello, world!", "earth"), the function will return -1 because "earth" is not found in the string "Hello, world!". Returned value: Integer	• field: A string constant or a string field you want to search in. The string field should be enclosed in back quotes (e.g., `myField`). Alternatively, an expression (such as another function or an expression that evaluates to a string) can also be used. • pattern: The pattern (regular expression) you want to find.
length(field)	This function calculates the length (number of characters) of a given string. If the string is null, the result will also be null. Example: If you use length("Hello"), the result will be: 5. This is because the string "Hello" has 5 characters. If the string is null, as in length(null), the result will also be null. This function is useful when you want to find out how long a string is or check if it's empty (length 0). Returned value: Integer	field: A string constant or a string field. The string field should be enclosed in back quotes (e.g., `myField`). Alternatively, an expression (such as another function or an expression that evaluates to a string) can also be used.
replaceAll(field, pattern, replacement)	This function replaces all occurrences of a given pattern (or text) in a given string. If the pattern is found, it gets replaced with a new value. If the pattern isn't found, the original value stays the same. Example: Let’s say we have a sentence: "I like apples, apples are great." We can use the replaceAll function to replace the word "apples" with "oranges." replaceAll("I like apples, apples are great.", "apples", "oranges") The result is: "I like oranges, oranges are great." In this case: The function searches for every occurrence of the word "apples" and replaces it with "oranges." Since the pattern matched twice in the string, both occurrences of "apples" get replaced. If "apples" wasn't found in the sentence, the original sentence would have stayed unchanged. Returned value: String	• field: A string constant or a string field where you want to make replacements. The string field should be enclosed in back quotes (e.g., `myField`). Alternatively, an expression (such as another function or an expression that evaluates to a string) can also be used. • pattern: The text or rule (using a regular expression) you're looking for and want to replace. • replacement: The value or string expression with which to replace any matches.
replaceFirst(field, pattern, replacement)	Replaces the first occurrence of a given pattern (or text) in a given string or field value or the result of a specified expression. If the pattern isn't found, the original value stays the same. Example: We can use the replaceFirst function to replace the word "bananas" with "apples." replaceFirst("I love bananas. Bananas are my favorite fruit.", "bananas", "apples") The result is: "I love apples. Bananas are my favorite fruit." Here, the function looks for the first occurrence of the word "bananas" and replaces it with "apples." Notice that the second occurrence of "Bananas" remains unchanged because only the first match is replaced. Returned value: String	• field: A string constant or a string field where you want to make the replacement. The string field should be enclosed in back quotes (e.g., `myField`). Alternatively, an expression (such as another function or an expression that evaluates to a string) can also be used. • pattern: The text or rule (using a regular expression) you're looking for and want to replace. • replacement: The value or string expression with which to replace the first match.
substr(field, startIDX, endIDX)	This function is used to extract a part of a string, known as a substring, based on the given starting and ending positions. The substring is taken from the string, starting at the start index and ending just before the end index. If the end index is out of bounds, the substring goes up to the end of the string. If the start index is greater than the end index, the result will be an empty string. If the input string is null, the result will also be null. Example 1: Basic Substring Extraction The function substr("Hello, World!", 0, 5) would return "Hello" because: It starts at index 0 (which is the first letter "H"). It goes up to index 5, but doesn't include the character at index 5, so it stops right before the comma. Example 2: Out-of-Bounds Indices The function substr("OpenAI", 3, 10) would return "nAI" because: The string "OpenAI" ends at index 6, so even though 10 is outside the bounds, the function will return the substring up to the end of the string, which is "nAI". Example 3: Empty String When start > end The function substr("Hello, World!", 5, 2) will return an empty string because the start index is greater than the end index, as per the behavior described. Example 4: Null Input String The function substr(null, 0, 5) will return null because the input string is null.	• field: A string constant or a string field from which you want to extract the substring. The string field should be enclosed in back quotes (e.g., `myField`). Alternatively, an expression (such as another function or an expression that evaluates to a string) can also be used. • startIDX: The starting position (inclusive) of the substring. • endIDX: The ending position (exclusive) of the substring.
toLowerCase(field)	This function takes a string as input and converts it to lowercase. If the input is null, the function returns null. Example: The function toLowerCase("Hello World") would return "hello world". Returned value: String	field: A string constant or a string field. The string field should be enclosed in back quotes (e.g., `myField`). Alternatively, an expression (such as another function or an expression that evaluates to a string) can also be used.
toUpperCase(field)	This function takes a string as input and converts it to uppercase. If the input is null, the function returns null. Example: The function toUpperCase("Hello World") would return "HELLO WORLD". Returned value: String	field: A string constant or a string field. The string field should be enclosed in back quotes (e.g., `myField`). Alternatively, an expression (such as another function or an expression that evaluates to a string) can also be used.
trim(field)	This function trims white space from the beginning and end of the input String value. Example: Let's say you have a field called name in a record, and it contains a string with extra spaces before and after the name. Before applying trim: name = " John Doe ". After applying trim: trim(`name`) The output is: "John Doe" In this example, the trim function removes the spaces before and after "John Doe", leaving just the name without the extra spaces. Returned value: String	field: A string constant or a string field. The string field should be enclosed in back quotes (e.g., `myField`). Alternatively, an expression (such as another function or an expression that evaluates to a string) can also be used.

Conditional Constructs

The Conditional category has constructs that help you pick values based on certain conditions.

Function	Description	Parameters
ifNull(field, replacementValue)	This function checks if a field has a value. If the field has a value (it's not null), it returns that value. If the field is null, it returns the replacement value instead. The result will be the type of the larger value (either the field or replacement value). The replacement value is only used if the field is null.	• field: The value you're checking. • replacementValue: The expression or value to use if the field is null.
ifThenElse(condition, (trueValue), (FalseValue))	This function checks a condition. If the condition is true, it returns the trueValue. If the condition is false, it returns the falseValue. The result will be the type of the larger value between trueValue and falseValue. Only the required value (either trueValue or falseValue) is evaluated.	• condition: A true/false expression that decides which value to return. • trueValue: The expression or value returned if the condition is true. • FalseValue: The expression or value returned if the condition is false.
case (baseExpression) when (condition) then (returnValue) when (condition2) then (returnValue2) end	This function works like a switch or case statement. If a base expression is provided, it checks it against each condition. When a match is found, the corresponding returnValue is returned. If no base expression is given, it checks each condition (which must be true/false checks) and returns the value of the first one that is true. If there is no match, it will return null.	• baseExpression: (optional) The value to compare against each case. • condition: Each condition that is checked for a match. • returnValue: The value returned if a condition matches.
case (baseExpression) when (condition) then (value) when (condition2) then (value2) else (defaultValue) end	This function works like a switch or case statement. If a base expression is provided, it checks it against each condition. When a match is found, the corresponding returnValue is returned. If no base expression is given, it checks each condition (which must be true/false checks) and returns the value of the first one that is true. If there is no match, it will return null or a default value (if provided).	• baseExpression: (optional) The value to compare against each case. • condition: Each condition that is checked for a match. • returnValue: The value returned if a condition matches. • defaultValue: (optional) An optional value returned if no match is found.

Remarks

• See DateTime Format

• See Class DecimalFormat

LookupValue

This function rule allows you to perform a lookup (search and retrieve values from another dataset). The values retrieved from the lookup are placed into a new, auto-generated derived field. The returned lookup values can then be used where Data Profiler accepts a derived field (in expressions and other rules, and to write values to a target). For step-by-step instructions, see Adding a lookup source, Creating a LookupValue rule and Writing retrieved lookup values to the target.

A derived field is stored in-memory (RAM). For more information, see Derived Fields.

Once the LookupValue rule is added to a profile, it is categorized under -MultiFieldRules- in the Field/Rule pane (instead of under -OutputFields-).

This rule uses an incore lookup which is stored in memory (RAM). The lookup dataset contains two columns that are of interest for configuring this rule: one column contains the values that reside in the source or derived field (referred to as the lookup key field); the other column stores the desired values (referred to as the return lookup field). When the rule executes, the engine compares the source or derived field value (referred to as the matching key value) with the lookup key field value. When these values match, the return lookup field value is added to the new derived field.

Lookups can also be created in the Source tab (see Creating Lookups in Profile Editor Source Tab).

This topic includes the following:

• Rule Properties

• Rule Parameters

• Adding a lookup source

• Creating a LookupValue rule

• Writing retrieved lookup values to the target

• Editing a lookup source

• Deleting a lookup source

• Example

Rule Properties

This rule has the following properties.

Rule Name	A default rule name (LookupValue or LookupValue_n, where “n” is 1,2,3, and so on) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field.
Field Name	-MultipleFields- is displayed here.
Rule Type	The type of rule that is applied to the field. That is LookupValue (Function rule). Tip... A Red Cross in a rule icon (for example ) indicates that the rule has a parameter that has not been defined.

Rule Parameters

This rule has the following parameters.

Lookup Name	Click Add to create a lookup in the Lookup Configuration Wizard, or select a lookup from the drop-down list and click to edit or click to delete a lookup. Tip... In the Lookup Configuration Wizard, connect to the lookup as with a source connector. For example, select the ASCII (Delimited) connector and set the Header property to True. When prompted to select the Lookup Key Field, select the lookup column for comparing against the source or derived field.
Lookup Key Field	The lookup field selected in the Lookup Configuration Wizard. This field contains the lookup values to compare against the Matching Key Field values. When the Lookup Key Field and the Matching Key Field values match, the Return Lookup Field (from the lookup) value is added to the derived field. If no match is found, a null or blank value is returned.
Matching Key Field	Select the field in the source (or derived field) to match against the Lookup Key Field values. When the Lookup Key Field and the Matching Key Field values match, the Return Lookup Field value is added to the derived field.
Return Lookup Field	Select the field in the lookup that contains the values to retrieve from the lookup dataset. These values are stored in-memory within the derived field.
Derived Field Name	A default derived field name (d_<FieldName>LookupValue or d_<FieldName>LookupValue_n, where “n” is 1,2,3, and so on) is provided and displayed here. However, you can edit or overwrite it. See Derived Fields.

Adding a lookup source

1. In the Rules tab, select any field, then click

>Add New Rule.
The Add Rules dialog opens.

2. In the Data Prep tab, select LookupValue, and Finish.
The Rule Definition pane opens. The Rule Name field displays the rule name which can be edited.

3. In the Rule Definition pane, click

(to the right of the Lookup Name drop-down list) to open the Configuration Wizard, then do the following:

• Enter a name for the lookup and a description (optionally). This name will appear in the Lookup drop-down list.

• Configure the lookup data connection parts and properties click Connect and Next.

• Verify that you are connected to the correct lookup data, note the name of the column to be used for the Lookup Key Field, then click Next.

• In the Field Name column, select the Lookup Key Field (which will be used by the LookupValue rule for matching against the source/derived Matching Key Field), then click Finish.

The lookup is now available in the Lookup Name drop-down list and the Source tab, Source drop-down list.

To configure the LookupValue rule to use the lookup, continue to Creating a LookupValue rule.

Creating a LookupValue rule

1. In the Rules tab, Field/Rule pane, select LookupValue under -MultiFieldRules-.

The Rule Definition pane opens.

2. In the Rule Definition pane define the rule parameters:

• Lookup Name: Select the required lookup.

• Lookup Key Field: The lookup column previously specified for the lookup is displayed.

Tip... To change this value, click the edit button beside the lookup name.

• Matching Key Field: Select the source or derived field to use for matching against the Lookup Key Field.

• Return Lookup Field: Select the lookup field that contains the values to retrieve and store in the new derived field.

• Save all files.

The retrieved lookup values can now be used in other rules or to map to an output field using the new derived field. To create a new output field for the retrieved lookup values, continue to Writing retrieved lookup values to the target.

Writing retrieved lookup values to the target

1. In the Rules tab, Field/Rule pane, create a new output field for the derived values:

• Click

>Add New Field, and enter a descriptive name for the new field.

• In the Expression column for the new field, select the required derived field from the drop-down list.

The new field is now mapped to the derived field containing the retrieved lookup values.

2. Save the profile, then click

to execute the profile.
The Total pass/fail data pie chart is displayed.

3. Click the pie chart to open the Drill Down data tab.
Scroll to the right to see the new output field populated with the data.

Tip... To omit a field from the output, select the field in the Field/Rule pane and click

to delete it. The field is not deleted in the source data.

Editing a lookup source

1. In the Rules tab, click

>Add New Rule.
The Add Rules dialog opens.

2. In the Data Prep tab, select LookupValue, and Finish.

3. Select the lookup you want to edit from the Lookup Name drop-down list.

4. Click

to edit (on the right of the Lookup Name drop-down list) in the Configuration Wizard.

5. Edit fields and properties as needed, click Connect and Next.

6. Verify that you are connected to the correct lookup data, note the column to be used for the Lookup Key Field (the field that contains the replacement value), then click Next.

7. In the Field Name column, select the field to use as the Lookup Key Field and click Finish.

Note: When users change the lookup name, connection, key field, replacement field, or derived field name, all related elements and metrics using the lookup file reflect the changes (for example, the rule name).

Deleting a lookup source

1. In the Rules tab, click

>Add New Rule.
The Add Rules dialog opens.

2. In the Data Prep tab, select LookupValue, and Finish.

3. Select the lookup you want to delete from the Lookup Name drop-down list.

4. Click

to delete (on the right of the Lookup Name drop-down list).
The lookup is removed from the Lookup Name drop-down list and the Source tab Source drop-down list.

Example

Let’s say that your source contains a State column containing full state names (such as California and Texas), and you need the values to be state codes (such as CA and TX). You also have a dataset, StateCodes, that contains a StateName column containing full state names, and an Abbreviation column containing the corresponding state code.

For example, the source State column (below, on the left) and the lookup dataset (below, on the right):

Source (or Derived Field)	Lookup: StateCodes
"State" "Texas" "California" "Ohio" "Ohio" "California"	"State_Name","Abbreviation" "California","CA" "Texas","TX" "Ohio","OH"
Note: The elements displayed in bold are used as the parameter settings in this example.

To use the dataset as the lookup to retrieve state codes you would add StateCodes as a lookup (see Adding a lookup source), and set the LookupValue rule parameters as follows:

• Lookup Name - StateCodes (The name of the lookup file.)

• Lookup Key Field - State_Name (The lookup field to match against the source Matching Key Field.)

• Matching Key Field - State (The source field to match against the Lookup Key Field value. When the Lookup Key Field and the Matching Key Field values match, the Return Lookup Field value is added to the derived field.)

• Return Lookup Field - Abbreviation (The lookup field that contains the values to retrieve from the lookup dataset. These values are stored in-memory within the new derived field.)

• Derived Field Name - d_State_LookupValue (The new, derived field containing the retrieved state code values which can be included in output by mapping an output field Expression property to the derived field.)

For step-by-step instructions, see Adding a lookup source, Creating a LookupValue rule and Writing retrieved lookup values to the target.

Supported Data Types

All data types. However, the Lookup Key Field and Matching Key Field values must be the same data type.

ParseExtract

This function rule enables users to parse and extract parts from a structured or composite field value (such as an URI, Email, Full Name, Address, and Timestamp). Users apply the parsing function that is relevant to the data type in the field. For example, the Parse Name, Parse Email or Parse Time function. The parsing functions generate a derived field for each part. For example, the Parse Email function extracts two parts: username@domain. The username part is to the left of the @ symbol and the domain part is to the right of the @ symbol. Two derived fields are generated - one for usernames and another for domains.

The derived fields are available in-memory and can be used for further profiling and analysis. Users can also include the derived field values with profile output fields. For more information, see Derived Fields.

For a step-by-step example, see Example

Some parsing functions requires additional parameters such as FORMAT or Regex in order to be specified.

Rule Properties

This rule has the following properties.

Rule Name	A default rule name (<FieldName>_ParseExtract) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field.
Field Name	The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, Email (String).
Rule Type	The type of rule that is applied to the field. That is ParseExtract (Function rule). Tip... A Red Cross in a rule icon (for example ) indicates that the rule has a parameter that has not been defined.

Rule Parameters

This rule has the following parameters.

Function	Select the parsing function that is relevant to the type of data in the field from the Function drop-down list. See Functions for details.
Format	Select the format desired. Each function has a different set of PartIDs available. See Functions for details.
	Click > and select one of the following options to add a new part: • Add Part - Select to manually add a single part from the field. A single row is added to the table. You can then select the specific part from the Part drop-down list. • Add Available Parts - Select to add all the available parts from the field. A single row is added to the table for each part. The Part column is auto-populated. Tip... To delete a part from the table, select the part and to delete it.
Part	The unique identifier for the part. This field is populated by the Function selected. For some functions, a dropdown list is provided with some or all of the available parts which you can select from. For example, the Parse Full Name function provides all of the available parts, including [first] and [last] which are parts for the first name and the last name. When a dropdown list of parts is provided, select a part from the list. When a dropdown list of parts for the function is not provided, enter a value for an element that is shown for the Format field selection. For example, the Parse Time Date function provides format options which include MM, dd and yyyy. Thus, you can enter MM for month, dd for day and yyyy for year as parts. Likewise, the Parse Timestamp and Parse Time functions provide format options which include HH, mm and ss. Thus, you can enter HH for hour, mm for minute for day and ss for seconds as parts. When the Extract RegX Groups function is used, this value refers to the group numbers, where 0 = entire match, 1 = first group, 2 = 2nd group, and so on. This value can also be a named group. Named groups can also be used if defined in the pattern (e.g., (?<name>...) ? name). For more information, see Functions.
Derived Field Name	A default derived field name (d_<FieldName>ParseExtract or d_<FieldName>ParseExtract_n, where “n” is 1,2,3, and so on) is provided and displayed here. However, you can edit or overwrite it. See Derived Fields.
Data Type	The data type of the part.

Functions

The following functions are available.

Function	Description
Parse Full Name	This function is only available for string fields, and requires that you select a format to parse string into. Parses a full name into the following PartIDs: • title - The title part of a full name. For example, Mr and Mrs. • first - The first name part of a full name. • middle - The middle name part of a full name. • last - The last name part of a full name. • suffix - The suffix part of a full name. For example, PhD and Sr.
Parse Address	Parses a postal address into the following PartIDs: • street - The street address. • city - The city name. • postcode - The zip code parsed from the address. • state - The state name. • po_box - The post office box number. • country - The country name.
Parse Email	Parses an email into the following PartIDs: • username - The username part of the email (the part before the @ symbol). • domain - The email domain (the part after the @ symbol).
Parse Timestamp	This function is only available for string fields, and requires that you select a format to parse string into. Parses the parts of a timestamp based on the symbols defined for SimpleDateFormat (Java Platform SE 8). See https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html. One or more symbols can be referenced in a single part. For example, PartID = dd-MM returns 12-09 for a timestamp value 12-09-2002T30:30:34
Parse Date	This function is only available for string fields, and requires that you select a format to parse string into. Parses the parts of a date based on the symbols defined for SimpleDateFormat (Java Platform SE 8). See https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html. One or more symbols can be referenced in a single part. For example, PartID = dd-MM returns 12-09 for a timestamp value 12-09-2002T30:30:34
Parse Time	This function is only available for time fields, and requires that you select a format to parse string into. Parses time parts based on the symbols specified for SimpleDateFormat (Java Platform SE 8). See https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html. One or more symbols can be referenced in a single part. For example, PartID = HH-mm returns 30-30 for a timestamp value 12-09-2002T30:30:34.
Parse URI	Parses an URI into the following PartIDs: • scheme - The scheme part of the URI. For example, http, https, ftp, s3. • username - The username part of the URI, if one is referenced. • host - The hostname part of the URI. • port - The port specified in the URI. • path - The path segment of the URI. • query - The query part of the URI which is typically followed by the ? character.
Extract RegX Groups	Extracts the groups captured from the specified regular expression when it matches the field value. This function requires that you enter a regular expression. The Part identifier in this case refers to the group numbers, where 0 = entire match, 1 = first group, 2 = 2nd group, and so on. The Part identifier can also be a named group. Simple example: (.+?)\s(.+?) - In this case, there are two groups as can be seen in matching parenthesis groups. Suppose the input is ‘Actian Corporation’, then PartId = 1 returns Actian, PartId = 2 will yield Corporation. Named Group example: (<cname>.+?)\s.+ - In this case, there is one named group, cname. PartId = cname returns Actian for the input Actian Corporation.

Supported Data Types

String (input)

String (output)

Remarks

Does not provide pass and fail statistics and generates a derived field. See Derived Fields.

Example

To illustrate how to use the ParseExtract rule, we apply the rule to a Name column which contains a first name, middle name and last name in a single field. We will extract the first and last names into two new output fields.

1. In the Rules tab, Fields section, click

> Add New Rule, then select the Name field from the Select Field Name drop-down list.

2. In the Data Prep tab, select ParseExtract, and Finish.
The rule is displayed under the Name field.

3. In the Rule Definition pane, the default Rule Name, Name_ParseExtract, is displayed (which can be edited). Do the following:

• Select Parse Full Name from the Function drop-down.

• Select [last] [first] from the Format drop-down.

• Click

> Add Available Parts.

• Select unwanted parts (retain first and last) and click

to delete.

4. In the Field/Rule pane, create two new output fields by doing the following:

• Click

> Add New Field, change the default name, field-1 (which appears at the bottom of the Field/Rule list), to match the first part name. In this case, we enter first.

• Click

> Add New Field, change the default name, field-2 (which appears at the bottom of the Field/Rule list), to match the first part name. In this case, we enter last.

5. Map the two new fields to the relevant derived field content by doing the following:

• first field - Click in the Expression field and select d_Name_ParseExtract_1 from the drop-down list.

• last field - Click in the Expression field and select d_Name_ParseExtract_3 from the drop-down list.

6. Save the profile, then click

to execute the profile.
The Total pass/fail data pie chart is displayed.

7. Click the pie chart to open the Drill Down data tab.
Scroll to the right to see the three new fields populated with the data.

Tip... If you no longer need to see the Name field in the output you can remove it: Select it under -Output Fields- and click

to delete it. The Name field is not deleted in the source data.

PhoneticStringEncode

This function rule calculates the most frequently occurring values of a field by phonetically encoding the string field using a specified algorithm.

Rule Properties

This rule has the following properties.

Rule Name	A default rule name (<FieldName>_PhoneticStringEncode) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field.
Field Name	The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, ZIP (String).
Rule Type	The type of rule that is applied to the field. That is PhoneticStringEncode (Function rule).

Rule Parameters

Algorithm

The encoding algorithm:

• Soundex - This is a phonetic algorithm for indexing names by sound, as pronounced in English.

• RefinedSoundex - RefinedSoundex phonetically encodes a string using Apache's refined soundex algorithm which is optimized for spell checking words.

• Caverphone - Coverphone is a phonetic matching algorithm used to identify English names with their sounds.

• Metaphone - Metaphone is a phonetic algorithm, for indexing words by their English pronunciation. It improves on the Soundex algorithm by using information about variations and inconsistencies in English spelling and pronunciation.

• DoubleMetaphone - Double Metaphone algorithm in contrary to the Metaphone algorithm whose application is limited to English only, takes into account spelling peculiarities of a number of other languages.

Derived Field Name

A default derived field name (d_FieldName_PhoneticStringEncode, d_FieldName_PhoneticStringEncode_n, where “FieldName” is the field the rule applies to, “n” is 1,2,3, and so on) is provided and displayed here. However, you can edit or overwrite it. See Derived Fields.

Supported Data Types

String (input)

String (output)

Remarks

Provides pass and fail statistics and generates a derived field. See Derived Fields.

StringJoin

This function rule combines multiple fields into a single value, based on a specified delimiter. A derived field is auto-generated for the new value. The derived field can be used for further processing, to build additional rules (such as referencing the derived field in a rule), and to create a new output field.

This rule supports all data types. Default text formatting is used to convert and output to a string format. For example, date, time and timestamp fields are formatted using the ISO 8601 format.

Once the StringJoin rule is added to a profile, it is categorized under -MultiFieldRules- on the Rules tab, instead of under -OutputFields-.

A derived field is stored in-memory (RAM). For more information, see Derived Fields.

Rule Properties

This rule has the following properties.

Rule Name	The default rule name (StringJoin or StringJoin_n, where “n” is 1,2,3, and so on) is provided and displayed here. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field.
Field Name	-MultipleFields- is displayed here.
Rule Type	The type of rule that is applied to the field. That is StringJoin (Function rule). Tip... A Red Cross in a rule icon (for example ) indicates that the rule has a parameter that has not been defined.

Rule Parameters

This rule has the following properties.

Delimiter	(optional) Select or enter the delimiter to use between each string in the joined result. For example, if the delimiter is set to Comma (,) each string in the resulting value will be separated by a comma: <String1>,<String2>,<String3>. By default, this parameter is set to Comma (,). If a delimiter is not required, the delimiter can be set to an empty string. Options are: • COMMA (default) • SEMICOLON • TAB • SPACE • PIPE • COLON • CR_LF • LF • CR
Skip Null/Empty Values	(required) Specifies whether null or empty values are included in the joined result. • Select to exclude (True) null or empty strings from the joined result. • Deselect to include (False) null or empty strings in the joined result. By default, this parameter is set to include (False).
Null Substitute	(optional) When the Skip Null/Empty Values parameter is set to include (False), this parameter specifies the string for representing null values in the final joined result. For example, if the null substitute is set to N/A, all null values are replaced with N/A in the joined result string: <String1>,<N/A>,<String3>. This parameter is only available when the Skip Null/Empty Values parameter is set to include (False).
Derived Field Name	A default derived field name (d_<FieldName>StringJoin or d_<FieldName>StringJoin_n, where “n” is 1,2,3, and so on) is provided and displayed here. However, you can edit or overwrite it. The name can be edited. See Derived Fields.
Available	Lists fields that are available for joining. Select one or more fields that you want to include in the join, then click > to move them to the Selected list. Click >> to add all the fields.
Selected	Lists fields that will be joined. Select one or more fields and click < to move them back into the Available list. Click << to remove all the fields. Click the up and down icons () to reorder the fields in the output.

Supported Data Types

All data types (input)

String (output)

Remarks

Does not provide pass and fail statistics and generates a derived field. See Derived Fields.

Example

To illustrate how to use the StringJoin rule, we join two columns: an Account column and a Balance column, using the pipe delimiter: <Account>|<Balance>.

After the StringJoin rule generates a derived field for the new, joined values, we create a new output field called Account&Balance, map it to the derived field, and view the new, joined values in the output. The Account&Balance output field will have, for example, the following output: 01-6063|262.98.

1. In the Rules tab, Fields section, click

> Add New Rule, then select the Account Number field from the Select Field Name drop-down list.

2. In the Data Prep tab, select StringJoin, and Finish.

3. In the Rule Definition pane, select PIPE from the Delimiter drop-down, then select null from the Null Substitute drop-down.

4. In the Available list, select Balance and click > to add it to the Selected list. Use the up/down arrows to reorder the fields.

5. In the Field/Rule pane, create a new output field by doing the following:

• Click

> Add New Field, change the default name, field-1 (which appears at the bottom of the Field/Rule list), to a descriptive name for the derived content. In this case, we name it Account&Balance.

• Map the new field to the derived field content by clicking the Account&Balance Expression field and selecting d_Account_Number_StringJoin from the drop-down list.

6. Save the profile, then click

to execute the profile.
The Total pass/fail data pie chart is displayed.

7. Click the pie chart to open the Drill Down data tab.
Scroll to the right to see the three new fields populated with the data.
To edit the rule, return to the Rules tab and select the rule under -MultiFieldRules-.

Tip... If you no longer need to see the Account Number field in the output you can remove it: Select it under -Output Fields- and click

to delete it. The Account Number field is not deleted in the source data.

StringSplit

This function rule splits a string value in a field into separate substrings, or parts, based on a specified delimiter. A derived field is auto-generated for each part. The derived fields can be used for further processing, to build additional rules (such as referencing the derived field in a rule), and to create new output fields.

For example, if your source data has a Name column that contains both first and last names, and you need a column strictly for first names and a column strictly for last names. You can use the StringSplit rule to create the two columns. The columns can then be included in the profile output.

For a step-by-step example, see Example

A derived field is stored in-memory (RAM). For more information, see Derived Fields.

Rule Properties

This rule has the following properties.

Rule Name	The default rule name (<FieldName>_StringSplit) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field.
Field Name	The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, Name (String).
Rule Type	The type of rule that is applied to the field. That is StringSplit (Function rule).

Rule Parameters

This rule has the following properties.

Limit	(optional) Specifies the maximum number of parts to include in the result of the string split. If the limit is greater than zero (0), the number of parts will not exceed the limit specified, and the last part will contain the last portion of the original string. If the limit is zero (0) or less, all substrings produced by the split will be included. The default value is 20. Note: You will still be able to add additional parts that exceed the Limit specified.
Delimiter	(optional) Select or enter the delimiter to be used for splitting the substrings in the field. The delimiter is treated as a literal string unless the Regex checkbox is selected (which makes it suitable for straightforward splitting scenarios). The default value is SPACE. Options are: • COMMA • SEMICOLON • TAB • SPACE • PIPE • COLON • CR_LF • LF • CR
Regex	(optional) Specifies whether the delimiter used is treated as a regular expression. By default, this option is deselected. • Select to treat the delimiter as a regular expression. • Deselect to treat the delimiter as a literal string such as a comma (,).
	(required) Add the parts that you want to derive into memory as a field. You can add the parts one at a time, or add the number specified in the Limit parameter. Click > select one of the following options: • Add Part - Adds a single row for referencing a specific split substring (for example, 0th, 10th, 20th, and so on). • Add Available Parts - Adds the number of parts specified by the Limit, and automatically fills in Part ids with 0 as the index of the first split substring. Note: After using Add Available Parts, you can still add additional parts.
Part	Index of the split substring within the result, starting with 0 as the first element.
Derived Field Name	A default derived field name (d_<FieldName>StringSplit or d_<FieldName>StringSplit_n, where “n” is 1,2,3, and so on) is provided and displayed here. However, you can edit or overwrite it. See Derived Fields.
Data Type	The data type in the derived field.

Supported Data Types

String (input)

This rule outputs a list of substrings which are derived as fields into memory.

Example

To illustrate how to use the StringSplit rule we use a case where the source data has a Name column that contains a value comprised of two substrings - a first name and a last name:

We want our output to have a column strictly for first names and a column strictly for last names.

Since the two substrings are separated by a space, we use the SPACE delimiter to split the value into two parts. When the parts are created two derived fields are auto-generated. We will link those derived fields to two new output fields we also create: FirstName and LastName.

1. In the Rules tab, click

> Add New Rule, then select the Name field from the Select Field Name drop-down list.
Rules that are applicable to the data in the selected field are listed.

2. In the Data Prep tab, select StringSplit, and Finish.
In the Rule Definition pane, the default Rule Name, Name_StringSplit, is displayed (which can be edited), and a single derived field placeholder is listed in the Parts table.

3. Set Limit to 2 and select SPACE from the Delimiter drop-down.

4. Click

> Add Part to add a second derived field name, and enter 1 in the Part column.

5. In the Field/Rule pane, create two new output fields by doing the following:

• Click

> Add New Field, change the default name, field-1 (which appears at the bottom of the Field/Rule list), to a descriptive name for the derived content. In this case, we name it First Name.

• Click

> Add New Field, change the default name, field-2 (which appears at the bottom of the Field/Rule list), to a descriptive name for the derived content. In this case, we name it Last Name.

6. Map the two new fields to the corresponding derived field by doing the following:

• First Name field - Click in the Expression field and select d_Name_StringSplit_0 from the drop-down list.

• Last Name field - Click in the Expression field and select d_Name_StringSplit_1 from the drop-down list.

7. Save the profile, then click

to execute the profile.
The Total pass/fail data pie chart is displayed.

8. Click the pie chart to open the Drill Down data tab.
Scroll to the right to see the new fields populated with the data.

StringToConversion

This conversion rule converts a given string field into a specified data type. It creates a new in-memory field (see Derived Fields).

Rule Properties

This rule has the following properties.

Rule Name	A default rule name (<FieldName>_StringToConversion) is provided and displayed here. However, you can edit or overwrite it. Click Reset to restore the default rule name. Note: The underscore (_) character is the only special character allowed in the name. Rule names cannot begin with a digit. If a field or column in the source data starts with a digit, 'r_' will be prepended to any rules created based on that field.
Field Name	The field name to which the rule applies is displayed here, along with the data type in parentheses. For example, ZIP (String).
Rule Type	The type of rule that is applied to the field. That is StringToConversion (Conversion rule).

Rule Parameters

This rule has the following parameters.

Convert To	The data type of the derived in-memory field: • Boolean • Date • Double • Float • Int • Long • Numeric • Time • TimeStamp
Format	The following formats are required when converting to Boolean derived field: • True.False • Yes.No • 1.0 • T.F
	The following formats are required when converting to Date derived field. You can select from the available options or specify your own pattern in the text box. See DateTime Format. • yyyy-M-dd • yyyy-MM-dd • MM/dd/yy • M/dd/yyyy • MM/dd/yyyy • M-dd-yyyy • MM-dd-yyyy • MMM-yy • MMM dd, yyyy
	The following formats are required when converting to Time derived field. You can select from the available options or specify your own pattern in the text box. See DateTime Format. • HH:mm:ss • HH:mm • hh:mm:ss • hh:mm aa
	The following formats are required when converting to TimeStamp derived field. You can select from the available options or specify your own pattern in the text box. See DateTime Format. • yyyy-MM-dd HH:mm:ss • MM/dd/yyyy HH:mm • MMM dd, yyyy HH:mm aa • EEE, dd MMM yyyy HH:mm:ss Z Note: Note: ISO 8601 format should be mentioned as yyyy-MM-dd'T'HH:mm:ss+hh:mm. T in this format must be enclosed within single quotes, 'T'. • Time Zone - Select your time zone from the available options.
	When converting to Numeric, Double, Float, Long, and Int derived field, there are no parameters to specify.
Derived Field Name	A default derived field name (d_FieldName_StringToConversion, d_FieldName_StringToConversion_n, where “FieldName” is the field the rule applies to, “n” is 1,2,3, and so on) is provided and displayed here. However, you can edit or overwrite it. See Derived Fields.

Supported Data Types

String (input)

Date, Time, TimeStamp, Boolean, Numeric, Double, Float, Long, Int (output)

Remarks

Provides pass and fail statistics and generates a derived field. See Derived Fields.

Last modified date: 01/08/2026