Dataflow

Concepts to Know : Introduction to DataFlow : Dataflow

Share this page

Dataflow

Most traditional programming models mirror the von Neumann architecture on which modern computer hardware is based. In this view, programs are simply a sequence of operations on data stored at known locations. Control flows from one operation to the next, executing in turn, operating on a shared data space.

The dataflow model, however, takes a more data-centric approach. Instead of control passing between instructions, the data does. An operation is executed when its inputs are available and passes its outputs to waiting consumers when it completes. Operations work on local data—the data that arrives on the inputs—not on shared data.

Instead of representing a program as a list of operations to be executed one at a time, a dataflow program can be represented as a directed graph of nodes—the operations—with the edges indicating the movement of data.

Expressing computation as a graph in this way has a number of advantages. For one, parallelism is made explicit. The only dependencies between operations are the edges of the graph, so nodes can be executed concurrently. Additionally, as operations do not modify shared data, they do not need synchronization mechanisms such as locks to manage concurrent access.

You can find dataflow and dataflow graphs in many places:

• Optimizing compilers build dataflow graphs of the code they are compiling to help determine register allocation and instruction ordering.

• Database query plans are dataflow graphs.

• Command pipelines in the UNIX shell are simple dataflows.

At a high level, Actian DataFlow is a framework for building and executing dataflow graphs. It takes advantage of the properties of dataflow graphs to automatically parallelize execution where possible. Because dataflow graphs have no inherent requirements on the locations of the operators, execution can either be local or distributed.