Logging the Cluster Processes

All processes involved in a clustered DataFlow environment provide a log for troubleshooting purposes. Both daemons and executor processes can be configured to generate log records for tracing execution. Logging for cluster daemons is built using log4j. Logging levels can be configured for the cluster. For more information about logging levels, see log4j documentation.

Cluster Manager and node managers can be configured to use a log rotation policy for limiting the total size of their log files. By default, the daemons use a single file of unlimited size. You can also specify the number of log files and the threshold using the daemon.logging.count and daemon.logging.limit properties, respectively.

The daemon.logging.directory property determines where the daemon log file set is located. The supplied directory path may use a special variable, %dr, to represent the DataFlow installation home. By default, %dr/daemonlogs is used as the directory path.

The names of the log files in the set cannot be configurable because they have a fixed format. The Cluster Manager log file is named clustermgr.log. The node manager log files are in the nodemgr_<node>.log format, where <node> is the name of the node registered with Cluster Manager. Archived copies of these log files use a name derived from these names.

Each executor has a private working directory where its logs and other working files are saved. All of these working directories are rooted at the node.executor.directory directory.

The directory path syntax allows a special variable %n. This variable is expanded to the name of the node. Each job is assigned a globally unique ID (GUID) when it is executed. The job working directory is <node.executor.directory>/<jobName>_<jobGUID>.

Each executor creates a log file with an unlimited size on the machine it is running. This file is always named job.log and is available at the root of the job working directory.

In addition to the log file, the job working directory stores temporary index files associated with the job. If running in OSGI mode, the job working directory has an osgi directory that is used by the OSGI framework for internal purposes.

<node.executor.directory>/
   <jobName>_<jobGUID>/: (job working directory)
       job.log (job log file)
       index/* (job index files for temporary files for job)
       osgi/* (job OSGI working directory)

By default log files and the associated working directories will be stored indefinitely in the executor’s private working directory. This behavior can be changed by setting the clustermgr.job.logging.retain.maxjobsize and/or the clustermgr.job.logging.retain.time properties.

Determines the maximum space that will be used to store working directories and their associated logs. The cluster manager will periodically check the total size of the working directories on each executor and will delete working directories starting with the oldest until the total size is under the retention threshold. If the retention time has not been set, the size check will occur once per day.

Determines the maximum allowed age before a working directory and its associated logs are deleted. If set, this setting also determines how often the cluster will check the total size of the working directories. Because of the nature of the asynchronous timer, the logs will not always be deleted as soon as they have reached the max retention time, although any logs over the specified age will be deleted when the age check occurs.