Evaluating Functions

The process of taking a function and computing its result when given a record is called evaluating the function. The record provides the context for the evaluation; any referenced fields in the function take on the value of the field in the record. Evaluation is done using a FunctionEvaluator object, which is acquired from the ScalarValuedFunction.

Understanding how to evaluate functions is only necessary for writing operators that will use functions. It is also useful for writing unit tests for functions, but not necessary, as the DeriveFields operator can be used instead (see Using the DeriveFields Operator to Compute New Fields).

The evaluation of functions refers to concepts of values and types in DataFlow, in particular the TokenValued and TokenSettable type hierarchies, which are not discussed here. Tokens and Types provides additional information on these topics which may be helpful in further understanding the evaluation process for functions.

The process of evaluating a function can be broken down as follows:

While this process appears somewhat complex for performing a single evaluation, functions were designed to be invoked repeatedly in the body of an operator. In that context, the first three steps are performed once while the last is repeated for each record, so there is not much added complexity overall.

Before using a function, it is typically necessary to verify that it is appropriate to use in the given context. For example, an arithmetic function cannot be used where a predicate function is expected; the return type is not appropriate. On the other hand, the fields referenced in the function may have the wrong type or may not exist; the input type is not appropriate.

The ScalarValuedFunction interface defines a method validateInputType(), which is useful for checking both these conditions. Given the type of the input record, it determines the output type. As a consequence of this processing, invalid input types will also be detected.

RecordTokenType inputType = ...
ScalarTokenType outputType = f.validateInputType(inputType);
if (!TokenTypeConstant.BOOLEAN.equals(outputType)) {
throw new RuntimeException("Function doesn't evaluate to boolean!");
}

While validateInputType() will signal invalid field references in a function by throwing an exception, it also is possible to manually verify all referenced fields. The getRequiredFields() method returns a list of all fields referenced by a function. This can be used to determine which, if any, references are invalid and handle them as is appropriate.

Sometimes it may be desirable to perform checks before the input type is known, particularly if only functions with certain return types are allowed. Not all invalid functions can be caught in this way, but some can; catching errors as early as possible is generally a good idea.

The getUpperBound() method can be used to get an “estimate” on the return type of the function. The function guarantees the real return type will be assignable to the type returned by this method. For some functions this may be exact; for others it may not provide any useful information at all. Use caution when performing checks based on the upper bound so as not to raise false negatives and exclude valid functions.

To evaluate a function, it is first necessary to bind the input and output, producing a FunctionEvaluator object. By binding the input and output, type checking can be done just once up front, streamlining the implementation and execution of the evaluation. This may be a small cost per evaluation, but when done millions of times in a operator, it can quickly add up.

The getEvaluator() method is used to acquire an evaluator bound to a specific input and output. Both the input and output are defined generically, using the RecordValued and ScalarSettable interfaces respectively. This abstraction provides flexibility in how functions are used. The resulting evaluator will compute the function whenever evaluate() is called, using the current value found in the input and storing the result in the output container.

RecordInput input = ...
RecordOutput output = ...
...
FunctionEvaluator evaluator = f.getEvaluator(input, output.getField("result"));
while (input.stepNext()) {
    ...
    evaluator.evaluate();
    // Computed value is now stored in "result" field of output
    ...
}
...

Note that the previous example binds the input port and a field of the output port directly to the evaluator. Each time stepNext() is invoked, the value of the input port changes and therefore evaluate() will operate on the new value. Using the output field to store the result means a result buffer need not be allocated and the value copied to the output.