Distributed Mode
Distributed mode executes a graph across a collection of machines, collectively referred to as a cluster. Running on multiple machines, distributed mode is able to scale out as well as up.
Because different partitions in distributed mode may not reside in the same JVM, much less on the same machine, data exchange between partitions is performed through temporary files. These exchanges are done in between phases in the execution plan. The framework automatically adds operators to handle this process, making it transparent to the rest of the graph.
A number of processes are involved in the distributed execution of a dataflow graph. There are long-running daemon processes executing on the machines comprising the cluster. These in turn provide the framework for launching processes that are responsible for doing the actual processing of the graph.
Last modified date: 06/14/2024