Worker Configuration
The final step in preparing Integration Manager to run DataFlow is setting up the worker. This is a fairly straightforward process, assuming that you already have a working knowledge of DataFlow and the particular engine and cluster configurations you want for this instance. This section does not outline how to optimize your DataFlow instance or configure it, but instead describes how to pass those cluster and engine configurations into the worker.
Assumptions:
• DataFlow is installed on the worker machine (embedded or remote).
• You have access to a current DataFlow license on the worker machine (embedded or remote).
First, you will need to open the application.properties file. All configuration at the worker level is done here.
application.properties
# Dataflow Configuration
dataflow.enabled=true
dataflow.licensePath=${sharedDataPath}/license/df6.slc
dataflow.localEngineInstallPath=${sharedDataPath}/actian-dataflow-6.6.1-17/bin
dataflow.localEngineListenerPort=4998
dataflow.localEngineRemoteJvmArgs=-Xms64m -Xmx1g -XX:PermSize=64m -XX:MaxPermSize=256m -Dsun.client.defaultConnectTimeout=60000 -Dsun.client.defaultReadTimeout=180000
dataflow.allowAllExecutables=false
#dataflow.executableWhiteList=
dataflow.charset=utf-8
dataflow.strictMode=disabled
#dataflow.clusterConf=yarn://datarush-head.datarush.local
#dataflow.engineConf=moduleConfiguration=datarush-hadoop-cdh5,name2=value,name3=value
The worker interfaces DataFlow using the RushScript command line interface (dr and dr.bat).
Next, provide any specific Engine configurations. The engineConf is a series of comma-delimited key-value pairs. strictMode is set to either disabled, warning or error.
You now should be able to start the worker, and it will begin listening on the queue for dataflow jobs.