Was this helpful?
Executor Settings for Each Node
Node settings control the configuration of JVMs launched by the node manager to execute distributed graphs. These properties represent settings that are dependent on host configuration and may therefore vary from machine to machine.
Property
Default
Description
node.executor.directory
%dr/work/%n
The working directory to use for spawned executors. This directory will be created if it does not exist. If relative, it is interpreted relative to the node manager’s working directory.
The sequence %n is a special variable that will be expanded to the node’s registered name.
The sequence %dr is a special variable that will be expanded to the DataFlow installation root.
node.executor.scratch.directory
node.executor.directory /scratch
The directory to use for temporary storage on the node. By default, this is the "scratch" subdirectory of the configured node.executor.directory.
Multiple directories can be provided, separated by commas. If this is the case, allocation of temporary files is on a round-robin basis among the listed directories.
node.java.home
Java home used to launch Cluster Manager.
The JDK to use for launching executors. This also determines the JDK to use when launching node managers from the admin GUI. Node managers can also be launched from the clustermgr CLI, in which case they use the Java set in JAVA_HOME.
node.dataflow.home
DataFlow home used to launch Cluster Manager.
The root directory of the DataFlow installation.
This location is needed to start a node manager from the Nodes page. If not set, it is assumed to be the same as the installation directory for Cluster Manager.
node.hadoop.home
/usr/lib/hadoop
The root directory of the Hadoop installation.
This location is needed to find the Hadoop configuration on each node in the cluster.
node.heartbeat.interval
30 seconds
The number of seconds that node managers should wait between sending heartbeats to Cluster Manager. In the event that Cluster Manager terminates abnormally and restarts, the node manager heartbeat reregisters the node manager with Cluster Manager. Therefore, this property determines the maximum time after a Cluster Manager restart that will elapse before Cluster Manager rediscovers all of the node managers.
node.resources.cpu
The number of processors as reported by Runtime.availableProcessors()
The number of CPUs available for use on the node. Job scheduling will take this capacity into account when deciding which machines to use. A value of 0 uses the default.
This setting usually corresponds 1-to-1 to physical cores. However, this can be treated as virtual cores, allowing fractional scheduling or leveling between heterogeneous nodes.
node.resources.memory
512 * node.resources.cpu
The amount of memory, in MB, available for use on the node. Job scheduling will take this capacity into account when deciding which machines to use.
node.executor.base.cgroup
/dataflow
The cgroup to use for managing the resource usage of job workers on the node. This cgroup must be attached to at least the cpu and memory subsystems and allow the dataflow user to administer it.
This setting is used only if job.resource.usage.control is set to CGROUP.
Last modified date: 12/09/2024