Building DataFlow Applications : Building DataFlow Applications in Java : DataFlow Operator Library : Manipulating Records
 
Share this page                  
Manipulating Records
DataFlow Operators for Manipulating Records
The DataFlow operator library includes several operators for manipulating the fields contained within record ports. These operators are covered in this section.
Covered Field Operations
Using the DeriveFields Operator to Compute New Fields
Using the DiscoverEnums Operator to Discover Enumerated Data Types
Using the MergeFields Operator to Merge Fields
Using the RemoveFields Operator to Remove Fields
Using the RetainFields Operator to Retain Fields
Using the SelectFields Operator to Select Fields
Using the RemapFields Operator to Rename Fields
Using the SplitField Operator to Split Fields
Using the RowsToColumns Operator to Convert Rows to Columns (Pivot)
Using the ColumnsToRows Operator to Convert Columns to Rows (Unpivot)
Using the DeriveFields Operator to Compute New Fields
The DeriveFields operator is used to derive one or more new fields from existing fields in the input data. New output fields are derived by applying predefined functions to fields within the input record data. One new output field is generated per function applied to the input data. The result is an output record flow that contains the input data plus the function results.
It is possible to overwrite existing fields with new derived values. It is also possible to omit input fields in the result, effectively applying a complete transformation to the record.
Applying multiple functions to an input record flow within a single DataFlow process is usually more efficient than applying each function in its own process. This is due to many factors, but mainly preventing processor cache thrashing, saving data copies, and lowering thread context switching.
Multiple derivations can be applied to the input data flow. Each derivation specifies two things: the name of the output field and the function to apply. The names of the input fields to the function are provided as part of the function specification.
Alternatively, a function can be passed as the input to another function. This type of function chaining lets you specify complex operations to perform on the input data. The output of each function in the chain is passed as input to the next function.
When applying multiple derivations in a single operator, the resulting fields from preceding functions can be used in subsequent functions. For example, if your input type contains the fields a, b, and c, you could apply the following derivations in a single operator:
a + b as d,
c * d as e
Even though d is not in the input type, it can be referenced in the second function since it is defined in a preceding function. However, the following would not be valid:
c * d as e,
a + b as d
since d has not yet been defined when the first function is evaluated.
A list of all available functions is in Available Functions.
Using Expressions to Derive Fields
Field derivations can also be specified by using the expression language. Individual FieldDerivations can be created by passing to them a scalar-valued function expression. These FieldDerivations can then be supplied to a DeriveFields operator as normal.
Additionally, any number of field derivations can be specified by passing a single function derivation expression directly to the DeriveFields operator. The syntax of a field derivation expression is:
<expression 1> as <field name 1>[, <expression 2> as <field name 2>, ...]
For example, you could pass the following expression directly to DeriveFields, assuming your input had a birthdate field:
dateTimeValue(birthdate, "MONTH") as month, dateTimeValue(birthdate, "DAY") as day, dateTimeValue(birthdate, "YEAR") as year
As with field names used elsewhere within expressions, the output field name can be surrounded by back-ticks if it contains non-alphanumeric characters, such as in the expression:
avg(Exam1, Exam2, Exam3) as ‘Exam Average‘
For more information about syntax and available functions, see Expression Language.
Code Example
The following example derives a new field named "discounted_price" from the input fields "l_extendedprice" and "l_discount". An expression is used to express how to calculate the new output field. The expression is set as the "derivedFields" property of the operator.
Using the DeriveFields operator in Java
// Calculate the discounted price
DeriveFields deriveFields = graph.add(new DeriveFields());
deriveFields.setDerivedFields("l_extendedprice * (1 - l_discount) as discounted_price");

or

deriveFields.setDerivedFields(
  new SimpleFieldDerivation(
    "fieldy", new FieldReference("fieldx")),
  new SimpleFieldDerivation("discounted_price",
    mult("l_extendedprice", sub(1, "l_discount"))) );
Using the DeriveFields operator in RushScript
// Calculate the discounted price
var results = dr.deriveFields(data, {derivedFields:'l_extendedprice * (1 - l_discount) as discounted_price'});

or

dr.deriveFields(data, {derivedFields: 'fieldx as fieldy, ' +
  'l_extendedprice * (1 - l_discount) as discounted_price'});
The following code fragment demonstrates chaining functions together. In the example, a field derivation is created that converts an input value into a double and then multiplies the result by the value of Pi. The resultant value is stored in the defined output field.
Using the API directly to chain functions
// Create a field derivation converting an input field into a double and multiplying it by pi
FieldDerivation fd = FieldDerivation.derive("outputField", Arithmetic.mult(Conversions.asDouble("inputField"), 3.14159));

DeriveFields deriveFields = graph.add(new DeriveFields(fd));
See the JavaDoc reference documentation for each function to determine the types of expressions and values that can be passed into the function.
Properties
The DeriveFields operator has the following properties.
Name
Type
Description
derivedFields
FieldDerivation... or String
This property can be set in one of two ways: as a String containing a derivation expression or as a list of FieldDerivation instances. Each derivation consists of two parts: a name to use in the output record for the derived field, and a ScalarValuedFunction to apply to any matching input fields to produce the output field.
If the name used is the same as a field that is already present in the input data, the original field will be overwritten by the new field. If multiple derivations apply to an output field, only the last one defined is used.
dropUnderivedFields
boolean
Set to true to drop the input fields from the output. Use when you want to output only the derived fields.
Ports
The DeriveFields operator provides a single input port.
Name
Type
Get Method
Description
input
getInput()
The data to derive the fields from.
The DeriveFields operator provides a single output port.
Name
Type
Get Method
Description
output
getOutput()
The derived fields plus any included input data.
Using the DiscoverEnums Operator to Discover Enumerated Data Types
Enumerated data types contain categorical data. This data type contains a limited, pre-defined set of values. For example, a field containing gender types can be defined as enumerated. The field contains two possible values: male and female.
The DiscoverEnums operator performs discovery of allowed values for a specified set of string fields, converting the specified fields into enumerated values. This operator works in two passes:
The first pass performs a distributed discovery of allowed values.
The second pass performs the conversions into the enumerated type.
Tip:  Discovering enumerated types requires an additional pass over the input data. If the possible values of the enumerated type are already known, it is more efficient to define the enumerated type when the data is initially sourced.
Code Example
The following example converts a field named "class" into an enumerated type. This is useful when all of the possible values of the enumerated type are not known beforehand. Some operators require that data fields be enumerated types.
Using the DiscoverEnums operator in Java
// Discover the "class" enumerated type
DiscoverEnums discoverEnums = graph.add(new DiscoverEnums());
discoverEnums.setIncludedFields(Arrays.asList("class"));
Using the DiscoverEnums operator in RushScript
var results = dr.discoverEnums(data, {includedFields:'class'});
Properties
The DiscoverEnums operator support the following properties.
Name
Type
Description
includedFields
List<String>
The list of fields to include in the discovery. The default value of "empty" implies that all fields are included.
maxDiscoveredValues
int
The maximum number of values to discover. This places a cap on the memory usage in the event that the input data contains a large number of distinct values. Fields that exceed the number of discovered values will remain as strings. Default: 1000
Ports
The DiscoverEnums operator provides a single input port.
Name
Type
Get Method
Description
input
getInput()
The input data.
The DiscoverEnums operator provides a single output port.
Name
Type
Get Method
Description
output
getOutput()
The output data with the specified fields converted to enumerated types.
Using the MergeFields Operator to Merge Fields
The MergeFields operator merges two streams of data with an equivalent number of rows into one. The output type contains all the fields from the left input followed by those from the right input. If any names exist in both the left and right input, the collision is resolved by renaming the fields from the right.
This operator is nonparallel because the equivalence of input sizes cannot be guaranteed in a parallel context; using this operator will force a gather of the input to a single machine. Even then, you must take care to ensure the ordering of input records is consistent with the desired result. Generally, this operator only makes sense when used inside a DeferredCompositeOperator where these guarantees can be made.
Code Example
Using the MergeFields operator in Java
// Create the merge fields operator
MergeFields merge = graph.add(new MergeFields());

// Connect the source ports to the inputs of merge fields
graph.connect(leftSource, merge.getLeftInput());
graph.connect(rightSource, merge.getRightInput());

// Connect the output of merge fields to an operator downstream
graph.connect(merge.getOutput(), target);
Using the MergeFields operator in RushScript
// Merge two data sources
var mergedData = dr.mergeFields(leftData, rightData);
Properties
The MergeFields operator has no properties.
Ports
The MergeFields operator provides the following input ports.
Name
Type
Get Method
Description
leftInput
getLeftInput()
The left side data to be merged.
rightInput
getRightInput()
The right side data to be merged.
The MergeFields operator provides a single output port.
Name
Type
Get Method
Description
output
getOutput()
The merged data.
Using the RemoveFields Operator to Remove Fields
The RemoveFields operator removes a subset of fields from the input records. Only those fields not named in the filter are copied to the output; if a field specified in the filter does not exist in the input, it is ignored. The relative order of fields within the input data is unchanged.
Tip:  Removing extraneous fields can provide a performance boost. Always try to keep flows to the minimum set of fields possible. This promotes maximum efficiency of the overall application.
Code Example
The following example uses the RemoveFields operator to remove fields "field1" and "field4" from an input flow. Downstream consumers of the RemoveFields operator access the trimmed flow without the extraneous fields.
Using the RemoveFields operator in Java
// Create the remove fields operator
RemoveFields removeFields = graph.add(new RemoveFields());
removeFields.setFieldNames(Arrays.asList("field1", "field4"));
Using the RemoveFields operator in RushScript
var results = dr.removeFields(data, {fieldNames:['field1','field2']});
Properties
The RemoveFields operator provides the following properties.
Name
Type
Description
fieldNames
List<String>
The names of the fields to remove from the input data.
Ports
The RemoveFields operator provides a single input port.
Name
Type
Get Method
Description
input
getInput()
The input data.
The RemoveFields operator provides a single output port.
Name
Type
Get Method
Description
output
getOutput()
The data with the specified fields removed.
Using the RetainFields Operator to Retain Fields
The RetainFields operator preserves a subset of fields from the input records. Only those fields named in the filter are copied to the output; if a field specified in the filter does not exist in the input, it is ignored. The relative order of fields within the input data is unchanged. To require the existence of certain fields, use the SelectFields operator instead.
Tip:  Removing extraneous fields can provide a performance boost. Always try to keep flows to the minimum set of fields possible. This promotes maximum efficiency of the overall application.
Code Example
The following code example uses the RetainFields operator to retain only "field1" and "field4" from the source data. Downstream operators from RetainFields will only have access to "field1" and "field4" data.
Using the RetainFields operator in Java
// Create the retain fields operator
RetainFields retainer = graph.add(new RetainFields());
retainer.setFieldNames(Arrays.asList("field1", "field4"));

// Connect the source port to the retain fields operator
graph.connect(source, retainer.getInput());

// Connect retain fields output to a downstream operator.
// This data only contains "field1" and "field4".
graph.connect(removeFields.getOutput(), target);
Using the RetainFields operator in JavaScript
// Retain only fields field1 and field4
var results = dr.retainFields(data, {fieldNames:['field1', 'field4']});
Properties
The RetainFields operator provides one property.
Name
Type
Description
fieldNames
List<String>
The names of the fields to retain from the input data.
Ports
The RetainFields operator provides a single input port.
Name
Type
Get Method
Description
input
getInput()
The input data.
The RetainFields operator provides a single output port.
Name
Type
Get Method
Description
output
getOutput()
The data with the specified fields retained.
Using the SelectFields Operator to Select Fields
The SelectFields operator preserves a subset of fields from the input records. Although similar to the RetainFields operator, there are two significant behavioral differences:
The fields are reordered according to their order in the filter.
All fields specified in the filter must exist in the input data or an error will be issued.
Note:  Removing extraneous fields can provide a performance boost. Always try to keep flows to the minimum set of fields possible. This promotes maximum efficiency of the overall application.
Code Example
The following code example uses the SelectFields operator to select only "field4" and "field1" from the source data. Downstream operators from SelectFields will have access only to "field4" and "field1" data. The order of the fields within the flow is changed to "field4", "field1". This feature is useful when fields are wanted in a specific order (for example, when writing to a file).
Using the SelectFields operator in Java
// Create the select fields operator
SelectFields select = graph.add(new SelectFields());
select.setFieldNames(Arrays.asList("field4", "field1"));

// Connect the source port to the select fields operator
graph.connect(source, select.getInput());

// Connect select fields output to a downstream operator.
// This data only contains "field1" and "field4".
graph.connect(select.getOutput(), target);
Using the SelectFields operator in RushScript
// Select fields: field4, field1 in that order
var results = dr.selectFields(data, {fieldNames:['field4', 'field1']});
Properties
The SelectFields operator provides one property.
Name
Type
Description
fieldNames
List<String>
The names of the fields to retain from the input data.
Ports
The SelectFields operator provides a single input port.
Name
Type
Get Method
Description
input
getInput()
The input data.
The SelectFields operator provides a single output port.
Name
Type
Get Method
Description
output
getOutput()
The data with the specified fields retained.
Using the RemapFields Operator to Rename Fields
The RemapFields operator rearranges and renames fields in a record. Two options are provided for handling source fields not specifically referenced in the field renaming:
Unmapped fields are removed from the result. This achieves the same effect as combining the renaming operation with the SelectFields operator.
Unmapped fields are kept in the result, with the names of the fields remaining the same as in the source. For example, if the source schema is [A,B,C] and the mapping is {B -> Z}, the resulting schema will be [A,Z,C].
Code Example
The following code example creates a RemapFields operator and configures it using a convenience method on the FieldRemapping class. The input field named "source1" will be renamed to "target1". All other fields will be left unchanged.
Using the RemapFields operator in Java
// Create the remap fields operator
RemapFields remap = graph.add(new RemapFields());

// Use a FieldRemappings convenience method to specify that the
// given field should be renamed and all others dropped
remap.setFieldRemapping(FieldRemappings.reorder(FieldRemappings.map("source1","target1")));

// Connect the remap operator to a data source
graph.connect(source, remap.getInput());
Using the RemapFields operator in RushScript
// Define and apply the remapping
var remapping = FieldRemappings.reorder(FieldRemappings.map("source1", "target1"));
var dataRemap = dr.remapFields(data, {fieldRemapping:remapping});
Properties
The RemapFields operator provides one property.
Name
Type
Description
fieldRemapping
Defines how input fields should be mapped to output fields.
Ports
The RemapFields operator provides a single input port.
Name
Type
Get Method
Description
input
getInput()
The input data.
The RemapFields operator provides a single output port.
Name
Type
Get Method
Description
output
getOutput()
The data with fields renamed and possibly removed.
Using the SplitField Operator to Split Fields
The SplitField operator splits a string field into multiple fields based on a specified pattern.
The SplitField operator has three properties:
splitField—The name of the field to split. The field must be of type String.
splitPattern—A regular expression describing the delimiter used for splitting.
resultMapping—A map of integers to strings, detailed below.
The contents of the split field will be split using the defined split pattern, resulting in an array of substrings. The key of the result mapping corresponds to an index within this array, and the associated value defines the output field in which to place the substring.
For example, if you have a record with a field named "time" containing times in the format 18:30:00, you could use the following SplitField operator to split the time into "hour", "minute", and "second" fields.
HashMap<Integer,String> map = new HashMap<Integer,String>();
map.put(0,"hour");
map.put(1,"minute");
map.put(2,"second");
SplitField splitter = new SplitField("time",":",map);
Code Example
This example uses the SplitField operator to split a String field representing a customer's email address into the local and domain part.
Using the SplitField operator in Java
// Split an email address field into it's components
SplitField split = graph.add(new SplitField());
split.setSplitField("emailAddress");
split.setSplitPattern("@");
HashMap<Integer,String> map = new HashMap<Integer,String>();
map.put(0, "emailUser");
map.put(1, "emailDomain");
split.setResultMapping(map);
Using the SplitField operator in RushScript
// Split an email address field into its components
var results = dr.splitField(data, {splitField:'emailAddress', splitPattern:'@', resultMapping:{0:'emailUser', 1:'emailDomain'}});
Properties
The SplitField operator provides the following properties.
Name
Type
Description
splitField
String
The name of the String field to split. If this field does not exist in the input or is not of type String, an exception will be issued at composition time.
splitPattern
String
The splitting pattern. The pattern should be expressed as a regular expression. The default value matches any whitespace.
resultMapping
Map<Integer,String>
The mapping of split indices to output field names. The key of each entry represents an index in the array resulting from splitting the input string, and the value represents the name of the output field in which to store that substring.
It is not necessary for every array index to be mapped or for every mapped index to exist in each split. If a value does not exist at a mapped index for a particular split, an empty string will be placed in the specified output field.
If an output field already exists in the input, or if a single output field is mapped to multiple indices, an exception will be issued at composition time.
Ports
The SplitField operator provides a single input port.
Name
Type
Get Method
Description
input
getInput()
The input data with the field to split.
The SplitField operator provides a single output port.
Name
Type
Get Method
Description
output
getOutput()
The output data with the original field split.
Using the RowsToColumns Operator to Convert Rows to Columns (Pivot)
The RowsToColumns operator pivots data from a narrow representation (rows) into a wider representation (columns).
The data is first segmented into groups using a defined set of group keys. The ordering of the group keys is important as it defines how the data is partitioned and ordered for the pivot operation. A pivot key field contains the distinct values that will be used as the pivot point. This field must be a string. A column is created in the output for each distinct value of the pivot key. An aggregation is defined which is performed on each data grouping defined by the group keys and the pivot key. The result of the aggregation for each unique value of the pivot key appears in the appropriate output column.
The following table provides an example set of data that we want to pivot by Region. There are only four regions: North, South, East and West. For each item, we want to show the total sales per region. Items can show up multiple times in a region since the data is also organized by store.
ItemID
StoreID
Region
Sales
1
10
North
1000
1
15
North
800
1
20
South
500
1
30
East
700
2
40
West
1200
2
10
North
500
2
15
North
200
To accomplish this pivot, the ItemID will be used as the group key. The Region will be used as the pivot key. And the Sales column will be the pivot value, aggregating by summing the values. The pivot key values are: "North", "South", "East", and "West". The result of the pivot is shown in the following table.
Note that the sales total for the West region for item 1 is empty. Scanning the input data shows that no sales were present in the West region for item 1. Item 1 did have two sales values for the North region. Those values (1000 and 800) are summed and the total (1800) appears in the North region column for item 1. Columns with values indicated as '?' imply null or non-existing value.
ItemID
North
South
East
West
1
1800
500
700
?
2
700
?
?
1200
The key concepts to understand in using the RowsToColumns operator are:
The use of a set of columns to segment the data into groups for pivoting. This set of column is called the group keys. The example used the ItemID as the group key.
A categorical valued column whose distinct values are used as columns in the output. This is the pivot key. The example used the Region as the pivot key.
A column that can be aggregated for each data grouping and within each pivot key. These are the pivot values. The example used the Sales column as the pivot value.
Code Examples
The following code examples demonstrate using the RowsToColumns operator to pivot the data provided in the previous example. Note that the pivot values must be specified explicitly. Also note that the DataFlow Expression Language can be used to specify the aggregations to calculate on the pivot values.
Using the RowsToColumns operator in Java
RowsToColumns pivot = app.add(new RowsToColumns());
pivot.setGroupKeys("ItemID");
pivot.setPivotKey("Region");
pivot.setPivotKeyValues("North", "South", "East", "West");
pivot.setAggregations("sum(Sales)");
pivot.setPivotColumnPattern("{0}_{1}");
Multiple aggregations can be used on the pivot values. This will result in multiple sets of results for each pivot key value. Also, a limited set of pivot key values can be provided, resulting in fewer output columns. This can be useful if only a subset of the pivot values are needed in the output. Both of these capabilities are demonstrated in this code example.
Using multiple aggregations and limited pivot values in Java
RowsToColumns pivot = app.add(new RowsToColumns());
pivot.setGroupKeys("ItemID");
pivot.setPivotKey("Region");
// Only want the North and South regions sales values pivoted
pivot.setPivotKeyValues("North", "South");
// Generate a sum and average of sales per region
pivot.setAggregations("sum(Sales), avg(Sales)");
Using the RowsToColumns operator in RushScript
var regions = ['North', 'South', 'East', 'West'];
var results = dr.rowsToColumns(data, {groupKeys:'ItemID', pivotKey:'Region', pivotKeyValues:regions, aggregations:'sum(Sales)', pivotColumnPattern:"{0} {1}"});
The following RushScript example demonstrates using limited pivot key values and multiple aggregations.
Using multiple aggregations and limited pivot values in RushScript
var regions = ['North', 'South'];
var results = dr.rowsToColumns(data, {groupKeys:'ItemID', pivotKey:'Region', pivotKeyValues:regions, aggregations:'sum(Sales), avg(Sales)'});
Properties
The RowsToColumns operator provides the following properties.
Name
Type
Description
aggregations
List<Aggregation> or Aggregation[] or String
The aggregations to apply to the grouped data. The aggregated values will be pivoted into each distinct pivot key value.
groupKeys
List<String> or String[]
The ordered list of key fields used to segment the input data into groups for pivoting. The specified aggregations are calculated within the context of each pivot group.
pivotColumnPattern
String
The naming pattern that will be used for new pivot columns. Use the special variable {0} within the string to insert the pivot key and {1} within the string to insert the aggregation expression into the column name. Default: "{0} {1}"
pivotKey
String
The input field to use as the pivot key. This field must be typed as a String. The aggregations specified will be calculated within data groups defined by the group keys and the pivot key. Pivot key field values are matched with the provided pivot values to determine into which output field an aggregated value is set.
pivotKeyValues
List<String> or String[]
The distinct values of the pivot key field. The values must be specified as a property. Only values specified here will be pivoted.
Ports
The RowsToColumns operator provides a single input port.
Name
Type
Get Method
Description
output
getOutput()
Contains the result of the pivot operation. The output will contain the group key fields followed by the aggregated pivot values. A set of aggregated pivot values is created for each aggregation specified. Each set will contain an output field per pivot value.
Using the ColumnsToRows Operator to Convert Columns to Rows (Unpivot)
The ColumnsToRows operator transposes values from row columns into multiple rows. This operator is used primarily to unpivot wide records into a form with repeating row key values possibly as an attempt to move data to a more normal form.
In the following example the multiple phone numbers and addresses for a given record are unpivoted to a more normalized contact, type, address and number format.
Example source:
Contact
Home Address
Home Phone
Business Address
Business Phone
John Smith
8743 Elm Dr.
555-1235
null
555-4567
Ricardo Jockamo
1208 Maple Ave.
555-7654
123 Main St.
null
Sally White
null
null
456 Wall St.
null
Unpivoted output:
Contact
type
address
number
John Smith
home
8743 Elm Dr.
555-1235
John Smith
business
null
555-4567
Ricardo Jockamo
home
1208 Maple Ave.
555-7654
Ricardo Jockamo
business
123 Main St.
null
Sally White
home
null
null
Sally White
business
456 Wall St.
null
Users define one or move pivot families to describe the output field and the source fields or constant values that will be mapped to these fields in the output. Target field datatypes are determined by finding the widest common scalar data type of the family’s unpivot elements. An exception can occur if no common type is possible. The number of elements defined in a keyFamily, or valueFamily must be equal.
Users may optionally define groupKeyFields that will be the fixed or repeating value portion of the multiple rows. If this property is unset, the group key fields will be determined by the remainder of the source fields not specified as pivot elements.
Note:  If, during the columns to rows process, all mapped values to the pivot columns are found to be null, that row will not be produced in the output. In the example above, if no pivotKeyFamily were defined, the Sally White "home" record would not have appeared in the output because both elements are null. Only because a pivotKeyFamily was defined does this record appear in the output.
Code Example
ColumnsToRows operator in Java
// Create the columns to rows operator
ColumnsToRows c2r= graph.add(new ColumnsToRows());

// Define an pivotKeyFamily of constant values
c2r.definePivotKeyFamily("type", "home", "business");

/*
*NOTE: the order in which the pivotValueFamilies are defined imply the order of those fields in the output.
*If a pivotKeyFamily is defined it will appear after the fixed portion of the output fields but before the pivot family fields.
*/
//Define an pivotValueFamily "address" containing source input fields "home address" and "business address"
c2r.definePivotValueFamily("address", "Home Address", "Business Address"));
//Define another pivotValueFamily "number" containing source input fields "home phone" and "business phone"
c2r.definePivotValueFamily("number", "Home Phone", "Business Phone"));

// Optionally specify the key (fixed) fields
c2r.setGroupKeyFields("Contact");

// Connect the source port to the inputs of ColumnsToRows
graph.connect(sourceFlow, c2r.getInput());

// Connect the output of ColumnsToRows to an operator downstream
graph.connect(c2r.getOutput(), target);
ColumnsToRows operator in RushScript
var flow = dr.columnsToRows(sourceFlow, {groupKeyFields:["Contact"], keyMap:{"type":["home", "business"]}, valueMap:{"address":["Home Address", "Business Address"], "number":["Home Phone", "Business Phone"]}, valueOrder:["address","number"] });
Properties
The ColumnsToRows operator supports the following properties.
Name
Type
Description
groupKeyFields
List<String>
An optional list of fields that comprise the fixed portion of the record. These values are repeated for each non-null value map entry that exists in the source record. If this property is left unset, this value will be inferred. All fields not explicitly set in the valueMap property will be added to the groupKeyFields
keyMap
Map<String,List<String>>
An optional "Map" definition that can contain only one entry. The "key", String, portion defines the name of the field that added to the output. The "value", List<String>, portion defines the labels that appear for each repeating segment in the output. These labels should coordinate positionally the the entries defined in the valueMap. If this property is set the number of elements in the "value" List<String>, must match the number of elements in value, List<String> of the valueMap.
valueMap
Map<String,List<String>>
A "Map" definition that may contain multiple entries. The "key", String, portion defines the name of the field that is to be added to the output. The "value", List<String>, portions defines the fields in the row that are to be unpivoted (transferred from columns to rows) in the output. The ordering of the value map entries is not enforced due to limitations by JSON serialization/deserialization. If output order is important, use the valueOrder property. The number of elements listed in the value portion of these properties must be consistent.
valueOrder
List<String>
An optional List that lets users force the ordering of the valueMap entries.
Ports
The ColumnsToRows operator provides a single input port.
Name
Type
Get Method
Description
input
getInput()
The input data.
The ColumnsToRows operator provides a single output port.
Name
Type
Get Method
Description
output
getOutput()
The data with the unpivoted results.