Property | Default | Description |
node.executor.directory | %dr/work/%n | The working directory to use for spawned executors. This directory will be created if it does not exist. If relative, it is interpreted relative to the node manager’s working directory. The sequence %n is a special variable that will be expanded to the node’s registered name. The sequence %dr is a special variable that will be expanded to the DataFlow installation root. |
node.executor.scratch.directory | node.executor.directory /scratch | The directory to use for temporary storage on the node. By default, this is the "scratch" subdirectory of the configured node.executor.directory. Multiple directories can be provided, separated by commas. If this is the case, allocation of temporary files is on a round-robin basis among the listed directories. |
node.java.home | Java home used to launch Cluster Manager. | The JDK to use for launching executors. This also determines the JDK to use when launching node managers from the admin GUI. Node managers can also be launched from the clustermgr CLI, in which case they use the Java set in JAVA_HOME. |
node.dataflow.home | DataFlow home used to launch Cluster Manager. | The root directory of the DataFlow installation. This location is needed to start a node manager from the Nodes page. If not set, it is assumed to be the same as the installation directory for Cluster Manager. |
node.hadoop.home | /usr/lib/hadoop | The root directory of the Hadoop installation. This location is needed to find the Hadoop configuration on each node in the cluster. |
node.heartbeat.interval | 30 seconds | The number of seconds that node managers should wait between sending heartbeats to Cluster Manager. In the event that Cluster Manager terminates abnormally and restarts, the node manager heartbeat reregisters the node manager with Cluster Manager. Therefore, this property determines the maximum time after a Cluster Manager restart that will elapse before Cluster Manager rediscovers all of the node managers. |
node.resources.cpu | The number of processors as reported by
Runtime.availableProcessors() | The number of CPUs available for use on the node. Job scheduling will take this capacity into account when deciding which machines to use. A value of 0 uses the default. This setting usually corresponds 1-to-1 to physical cores. However, this can be treated as virtual cores, allowing fractional scheduling or leveling between heterogeneous nodes. |
node.resources.memory | 512 * node.resources.cpu | The amount of memory, in MB, available for use on the node. Job scheduling will take this capacity into account when deciding which machines to use. |
node.executor.base.cgroup | /dataflow | The cgroup to use for managing the resource usage of job workers on the node. This cgroup must be attached to at least the cpu and memory subsystems and allow the dataflow user to administer it. This setting is used only if job.resource.usage.control is set to CGROUP. |