Execution Modes
Overview
DataFlow provides two modes of execution for graphs:
• Local mode, which executes a graph in the same JVM as the invoking code
• Distributed mode, which executes a graph remotely on a DataFlow cluster
The mode to use is specified when executing a graph. Regardless of whether a graph is intended for local or distributed execution, it is composed in the same way. The DataFlow model makes no distinctions with regard to the location of nodes; it is concerned only with how data is communicated between them.
This consistency in the composition model gives graphs some useful properties. For one, the implementation does not need to change as data scales change. If the amount of input data grows, the graph does not need to be rewritten for distributed execution. All that is required to scale out is to install a DataFlow cluster and change the requested execution mode.
Conversely, a distributed program can be run locally to perform debugging, greatly simplifying the task by removing the complexity introduced by running on several machines.
For examples of graph execution in the two modes, see
Example: Data Processing in DataFlow.