Executor
When a logical graph is executed, DataFlow allocates nodes from the cluster for executing it. Each node creates a process, called the executor, which is responsible for running the physical graphs for processing the partitions of data assigned to the node. This would be, for example, the physical graph corresponding to a phase of the plan produced by compiling a logical graph.
Only one executor per node is created for a given graph. An executor may process multiple partitions of data; each partition is processed independently and in parallel. Having partitions share an executor allows read-only sharing of common data, potentially improving performance. The executor terminates only when the logical graph is complete; the same executor is used for all phases of the same graph.
The executor runs in a separate JVM from the node manager. This division serves two important purposes. First, it isolates the node manager from errors that occur during the execution of the graph; a failure will not affect future operations on the node. Second, it allows the executor to be started with job-specific JVM parameters. The required settings may differ between graphs; in this way the node manager need not be restarted each time these values are changed.
Last modified date: 06/14/2024