Overriding the Operator Parallelism

Building DataFlow Applications : Building DataFlow Applications in Java : Composing an Application : Overriding the Operator Parallelism

Share this page

Each operator used the metadata model to determine whether to support parallelism. In the previous example, both the GenerateRandom and LogRows operators support parallelism and are available in the output from the application

The output contains eight sets of LogRows output, and these show the input data type and the number of visible rows. A parallel instance of the LogRows operator output each set.

In a few cases, you can override the fundamental parallelism of an operator to make it non-parallel. To do this, use the disableParallelism() method that is available in all the operator instances. To disable the parallelism on the LogRows operator in the above example, use the following code.

logger.disableParallelism();

When the parallelism is disabled on the LogRows operator, the following is the output from the sample application. Only a single set of logs are sent to the output by the operator to indicate that the operator is executed with parallelism disabled.

INFO

com.pervasive.datarush.graphs.internal.Logical GraphInstanceImpl execute Executing phase 0 graph: {[generateRandom, logRows]}

INFO SimpleApp.logRows execute Input type is

{"type":"record","representation":"DENSE_BASE_ NULL","fields":[{"dblField":{"type":"double"}}

,{"stringField":{"type":"string"}}]}

INFO SimpleApp.logRows execute Counted 1000 rows

INFO

com.pervasive.datarush.script.javascript.Dataf lowFactory execute script execution time: 2.55 secs

Warning! Disabling parallelism of an operator can cause data fan-in or fan-out behavior, and this affects the performance of the application. This occurs particularly when running DataFlow in a distributed environment.