Setting Class Cache
On submitting a job, the DataFlow client scans its class path, performing a checksum of the various entries in its class path. Those entries that are out of date are automatically synchronized to nodes in the cluster.
The class cache has two cache scopes:
• Node scope: Entries in the node-scoped cache are shared among all jobs that use the same node.
• Job scope: Entries in the job-scoped cache are private to the job.
The two caches are stored in a fixed directory structure, relative to the configuration property node.executor.directory. Here is the directory structure:
<node.executor.directory>/
classcache/ (node-scoped cache)
<jobName>_<jobGUID>/: (job working directory)
classcache/ (job-scoped cache)
The job-scoped cache is automatically deleted at the end of every job. Periodically, the node-scoped cache should be deleted manually. To do this, use the Flush button as shown in the following example:
Last modified date: 06/14/2024