Operators as Functions
Many common processing tasks follow a similar pattern: for each incoming record, perform processing on some or all of the fields and output a new record. Abstracting this, operators following this pattern can be thought of as a function that takes a record and produces a record as its result. Combining this with a generic operator that applies such functions to every input record yields an operator with identical behavior.
DataFlow supports the idea of modeling operators as functions (or collections of functions) with the concept of scalar valued functions (more commonly referred to as simply functions). A scalar valued function takes a record as input and returns a single scalar value as output. While DataFlow does not have functions that return a record, these can be represented as a collection of scalar valued functions, one per field, all applied to the same input record.
Modeling operations as functions has advantages. Like operators, functions can be composed to construct more complex functions. However, functions allow fine-grain reuse without incurring the same communication overhead as using operators. Functions also can be simpler to write than operators as they do not need to deal with a stream of input values but rather a single input value.
Last modified date: 01/06/2023