Building DataFlow Applications : Building DataFlow Applications in Java : DataFlow Operator Library : Asserting Conditions
 
Share this page                  
Asserting Conditions
DataFlow Assertion Operators
Several operators in the DataFlow library provide assertions for different data conditions. These are mainly used for testing but can be useful for other purposes. See the following topics for more information on these operators.
Covered Assertion Operations
Using the AssertEqual Operator to Assert Data Equality
Using the AssertEqualTypes Operator to Assert Data Type Equality
Using the AssertPredicate Operator to Assert a Predicate Condition
Using the AssertRowCount Operator to Assert Row Count
Using the AssertSorted Operator to Assert Data Ordering
Using the AssertEqualHash Operator to Assert Hash Equality
Using the AssertEqual Operator to Assert Data Equality
The AssertEqual operator verifies that the actual data input matches the expected data input. An error tolerance can be set to help compare floating point data within a specified error range.
Code Example
The following code fragment demonstrates how to use the AssertEqual operator to ensure the input data match from both ports.
Using the AssertEqual operator in Java
// Create a new assert equal operator and add it to a graph
AssertEqual assertEqual = graph.add(new AssertEqual());

// Set the error tolerance and the log frequency
plan.setErrorTolerance(new RelativeErrorBound(0.01));
plan.setLogFrequency(1000);

// Connect expected input with an upstream operator
graph.connect(expectedSource, assertEqual.getExpectedInput());

// Connect actual input with an upstream operator
graph.connect(actualInput, assertEqual.getActualInput());
Using the AssertEqual operator in RushScript
dr.assertEqual(data1, data2);
Properties
The AssertEqual operator provides the following properties.
Name
Type
Description
errorTolerance
The error tolerance for floating point values. The default is to use an exact match.
logFrequency
int
For every logFrequency rows, log the number of rows compared once. Setting logFrequency to zero means log only the total number of comparisons, which is the default.
Ports
The AssertEqual operator provides the following input ports.
Name
Type
Get Method
Description
expectedInput
getExpectedInput()
Expected data values.
actualInput
getActualInput()
Actual data values.
Using the AssertEqualTypes Operator to Assert Data Type Equality
The AssertEqualTypes operator asserts that two input flows have comparable types. As types may not be fully realized until composition time, the check is delayed until the composition phase of the life cycle.
The expected input port is optional. If it is not connected, the type of the actual input port will be compared to the expected type set by the setExpectedType() method.
Code Example
The following code fragment demonstrates how to use the AssertEqualTypes operator to ensure a data port is of an expected type.
Using the AssertEqualTypes operator in Java
// Create an assert equal types operator and add it to a graph
AssertEqualTypes assertTypes = graph.add(new AssertEqualTypes());

// Set the expected type
assertTypes.setExpectedType(TokenTypeConstant.record(TokenTypeConstant.STRING("name")));

// Connect the actual input to an upstream operator
graph.connect(source, assertTypes.getActualInput());
Using the AssertEqualTypes operator in RushScript
dr.assertEqualTypes(data1, data2);
Properties
The AssertEqualTypes operator has one property.
Name
Type
Description
expectedType
The expected type of the actual value input port. Set this type to compare to the actual port type when the expected input port is not used.
Ports
The AssertEqualTypes operator provides the following input ports.
Name
Type
Get Method
Description
expectedInput
getExpectedInput()
Expected data types. This port is optional.
actualInput
getActualInput()
Actual data types.
Using the AssertPredicate Operator to Assert a Predicate Condition
The AssertPredicate operator asserts that the given predicate is true for all input values. If the predicate is not true for any value, an exception will be issued, halting execution. This operator is useful when comparing values within the same data flow.
Code Example
The following code fragment demonstrates using the AssertPredicate operator to assert that for all input rows the values of field1 and field2 are equal.
Using the AssertPredicate operator in Java
// Create an assert predicate operator in a graph
AssertPredicate assertPredicate = graph.add(new AssertPredicate());

// Set the predicate to assert
assertPredicate.setPredicate(Predicates.eq("field1", "field2"));

// Connect assert predicate to an upstream operator
graph.connect(source, assertPredicate.getInput());
Using the AssertPredicate operator in RushScript
var predicate1 = Predicates.eq(FieldReference.value("field1"), ConstantReference.constant(1));
dr.assertPredicate(data1, {predicate:predicate1});
Properties
The AssertPredicate operator has one property.
Name
Type
Description
predicate
The predicate to use for assertion. It can be provided as a ScalarValuedFunction or as a String based on an expression similar to the "where" clause of a SQL query.
Ports
The AssertPredicate operator provides a single input port.
Name
Type
Get Method
Description
input
getInput()
The data to which the given predicate condition is applied.
Using the AssertRowCount Operator to Assert Row Count
The AssertRowCount operator verifies that the input flow contains the specified row count. This is a distributed operation if the input is distributed: it counts rows in each partition then sums to get a final count, at which point the assertion is applied.
Code Example
The following code fragment creates an AssertRowCount operator to verify that its input data has the exact number of specified rows.
Using the AssertRowCount operator in Java
// Create an assert row count operator and add it to a graph
AssertRowCount asserter = graph.add(new AssertRowCount());

// Set the expected number of rows
asserter.setRowCount(10000);

// Connect the asserter to an upstream operator
graph.connect(source, asserter.getInput());
Using the AssertRowCount operator in RushScript
dr.assertRowCount(data, {rowCount:10000});
Properties
The AssertRowCount operator has the following properties.
Name
Type
Description
logFrequency
int
The frequency with which to log row count.
rowCount
long
The expected number of rows.
Ports
The AssertRowCount operator provides a single input port.
Name
Type
Get Method
Description
input
getInput()
The input data on which to verify the row count.
Using the AssertSorted Operator to Assert Data Ordering
The AssertSorted operator verifies that the input data is sorted by the given set of keys. This represents a distributed operation if the input data is also distributed. Specifically, if the input data is distributed, this operator verifies that each input partition is sorted. If the input data is not distributed, this operator verifies that the single partition is sorted. Output data will be tagged with appropriate metadata such that downstream operations can leverage the knowledge of input ordering to choose more efficient algorithms for processing.
Tip:  Use the AssertSorted operator on data you know to be sorted to prevent DataFlow from injecting a sort due to metadata requirements. For example, if data within a file is already sorted as needed, use AssertSorted after the data reader.
Code Example
The following code fragment creates an AssertSorted operator specifying the logging frequency and the expected sort order.
Using the AssertSorted operator in Java
// Create an assert sorted operator adding it to a graph
AssertSorted assertSorted = graph.add(new AssertSorted());

// Set the logging frequency and the expected sort order
assertSorted.setLogFrequency(0);
assertSorted.setOrdering(SortKey.asc("field1"), SortKey.desc("field2"));

// Connect the assert sorted input to the output of an upstream operator
graph.connect(source, assertSorted.getInput());
Using the AssertSorted operator in RushScript
dr.assertSorted(data, {ordering:['"field1" asc', '"field2" desc']});
Properties
The AssertSorted operator has the following properties.
Name
Type
Description
logFrequency
int
The frequency with which to log. A logFrequency of zero means log only the total number of comparisons. This is the default behavior.
ordering
SortKey[] or String[]
The expected ordering of the input keys. If an ordering is not specified, all fields will be treated as keys using the default sort order of ascending.
Ports
The AssertSorted operator provides a single input port.
Name
Type
Get Method
Description
input
getInput()
The input data to be verified.
The AssertSorted operator provides a single output port.
Name
Type
Get Method
Description
output
getOutput()
The data that has been verified to be in the correctly sorted order.
Using the AssertEqualHash Operator to Assert Hash Equality
The AssertEqualHash operator verifies that the actual data input matches the expected data input without regard to order.
Code Example
The following code fragment demonstrates how to use the AssertEqualHash operator to ensure the hash of the input data matches from both ports.
Using the AssertEqualHash operator in Java
// Create a new assert equal hash operator and add it to a graph
AssertEqualHash assertEqualHash = graph.add(new AssertEqualHash());

// Connect expected input with an upstream operator
graph.connect(expectedSource, assertEqualHash.getExpectedInput());

// Connect actual input with an upstream operator
graph.connect(actualInput, assertEqualHash.getActualInput());
Using the AssertEqualHash operator in RushScript
dr.assertEqualHash(data1, data2);
Properties
The AssertEqualHash operator has no properties.
Ports
The AssertEqualHash operator provides the following input ports.
Name
Type
Get Method
Description
expectedInput
getExpectedInput()
Expected data values.
actualInput
getActualInput()
Actual data values.