Asserting Conditions
DataFlow Assertion Operators
Several operators in the DataFlow library provide assertions for different data conditions. These are mainly used for testing but can be useful for other purposes. See the following topics for more information on these operators.
Covered Assertion Operations
Using the AssertEqual Operator to Assert Data Equality
The
AssertEqual operator verifies that the actual data input matches the expected data input. An error tolerance can be set to help compare floating point data within a specified error range.
Code Example
The following code fragment demonstrates how to use the
AssertEqual operator to ensure the input data match from both ports.
Using the AssertEqual operator in Java
// Create a new assert equal operator and add it to a graph
AssertEqual assertEqual = graph.add(new AssertEqual());
// Set the error tolerance and the log frequency
plan.setErrorTolerance(new RelativeErrorBound(0.01));
plan.setLogFrequency(1000);
// Connect expected input with an upstream operator
graph.connect(expectedSource, assertEqual.getExpectedInput());
// Connect actual input with an upstream operator
graph.connect(actualInput, assertEqual.getActualInput());
Using the AssertEqual operator in RushScript
dr.assertEqual(data1, data2);
Properties
The
AssertEqual operator provides the following properties.
Ports
The
AssertEqual operator provides the following input ports.
Using the AssertEqualTypes Operator to Assert Data Type Equality
The
AssertEqualTypes operator asserts that two input flows have comparable types. As types may not be fully realized until composition time, the check is delayed until the composition phase of the life cycle.
The expected input port is optional. If it is not connected, the type of the actual input port will be compared to the expected type set by the setExpectedType() method.
Code Example
The following code fragment demonstrates how to use the
AssertEqualTypes operator to ensure a data port is of an expected type.
Using the AssertEqualTypes operator in Java
// Create an assert equal types operator and add it to a graph
AssertEqualTypes assertTypes = graph.add(new AssertEqualTypes());
// Set the expected type
assertTypes.setExpectedType(TokenTypeConstant.record(TokenTypeConstant.STRING("name")));
// Connect the actual input to an upstream operator
graph.connect(source, assertTypes.getActualInput());
Using the AssertEqualTypes operator in RushScript
dr.assertEqualTypes(data1, data2);
Properties
The
AssertEqualTypes operator has one property.
Ports
The
AssertEqualTypes operator provides the following input ports.
Using the AssertPredicate Operator to Assert a Predicate Condition
The
AssertPredicate operator asserts that the given predicate is true for all input values. If the predicate is not true for any value, an exception will be issued, halting execution. This operator is useful when comparing values within the same data flow.
Code Example
The following code fragment demonstrates using the
AssertPredicate operator to assert that for all input rows the values of field1 and field2 are equal.
Using the AssertPredicate operator in Java
// Create an assert predicate operator in a graph
AssertPredicate assertPredicate = graph.add(new AssertPredicate());
// Set the predicate to assert
assertPredicate.setPredicate(Predicates.eq("field1", "field2"));
// Connect assert predicate to an upstream operator
graph.connect(source, assertPredicate.getInput());
Using the AssertPredicate operator in RushScript
var predicate1 = Predicates.eq(FieldReference.value("field1"), ConstantReference.constant(1));
dr.assertPredicate(data1, {predicate:predicate1});
Properties
The
AssertPredicate operator has one property.
Ports
The
AssertPredicate operator provides a single input port.
Using the AssertRowCount Operator to Assert Row Count
The
AssertRowCount operator verifies that the input flow contains the specified row count. This is a distributed operation if the input is distributed: it counts rows in each partition then sums to get a final count, at which point the assertion is applied.
Code Example
The following code fragment creates an
AssertRowCount operator to verify that its input data has the exact number of specified rows.
Using the AssertRowCount operator in Java
// Create an assert row count operator and add it to a graph
AssertRowCount asserter = graph.add(new AssertRowCount());
// Set the expected number of rows
asserter.setRowCount(10000);
// Connect the asserter to an upstream operator
graph.connect(source, asserter.getInput());
Using the AssertRowCount operator in RushScript
dr.assertRowCount(data, {rowCount:10000});
Properties
The
AssertRowCount operator has the following properties.
Ports
The
AssertRowCount operator provides a single input port.
Using the AssertSorted Operator to Assert Data Ordering
The
AssertSorted operator verifies that the input data is sorted by the given set of keys. This represents a distributed operation if the input data is also distributed. Specifically, if the input data is distributed, this operator verifies that each input partition is sorted. If the input data is not distributed, this operator verifies that the single partition is sorted. Output data will be tagged with appropriate metadata such that downstream operations can leverage the knowledge of input ordering to choose more efficient algorithms for processing.
Tip: Use the AssertSorted operator on data you know to be sorted to prevent DataFlow from injecting a sort due to metadata requirements. For example, if data within a file is already sorted as needed, use AssertSorted after the data reader.
Code Example
The following code fragment creates an
AssertSorted operator specifying the logging frequency and the expected sort order.
Using the AssertSorted operator in Java
// Create an assert sorted operator adding it to a graph
AssertSorted assertSorted = graph.add(new AssertSorted());
// Set the logging frequency and the expected sort order
assertSorted.setLogFrequency(0);
assertSorted.setOrdering(SortKey.asc("field1"), SortKey.desc("field2"));
// Connect the assert sorted input to the output of an upstream operator
graph.connect(source, assertSorted.getInput());
Using the AssertSorted operator in RushScript
dr.assertSorted(data, {ordering:['"field1" asc', '"field2" desc']});
Properties
The
AssertSorted operator has the following properties.
Ports
The
AssertSorted operator provides a single input port.
The
AssertSorted operator provides a single output port.
Using the AssertEqualHash Operator to Assert Hash Equality
The
AssertEqualHash operator verifies that the actual data input matches the expected data input without regard to order.
Code Example
The following code fragment demonstrates how to use the
AssertEqualHash operator to ensure the hash of the input data matches from both ports.
Using the AssertEqualHash operator in Java
// Create a new assert equal hash operator and add it to a graph
AssertEqualHash assertEqualHash = graph.add(new AssertEqualHash());
// Connect expected input with an upstream operator
graph.connect(expectedSource, assertEqualHash.getExpectedInput());
// Connect actual input with an upstream operator
graph.connect(actualInput, assertEqualHash.getActualInput());
Using the AssertEqualHash operator in RushScript
dr.assertEqualHash(data1, data2);
Properties
The
AssertEqualHash operator has no properties.
Ports
The
AssertEqualHash operator provides the following input ports.