Was this helpful?
DataFlow Invoker
The DataFlow Invoker allows you to execute DataFlow scripts and JSON graphs within a DataConnect process. The DataFlow Invoker uses the DataFlow command line to call and run an existing DataFlow application. The supported DataFlow Invoker version is 6.2.0.
Note:  Actian DataConnect supports DataFlow 6.6.
The DataFlow Invoker allows you to run workflows that uses the DataFlow libraries in the following ways:
Using RushScript to make calls to the DataFlow library JavaScript API (see RushScript topic in the DataFlow documentation.)
Using a custom Java class to make calls to the DataFlow library Java API (see the DataFlow API Usage topic in the DataFlow documentation.)
Exporting a workflow from RushAnalytics and executing that workflow with the DataFlow Invoker as a JSON file (see the Enabling Workflows to Execute with DataFlow topic in the DataFlow documentation.)
For more details about building workflows, see the Building DataFlow Applications in Java topic in the DataFlow documentation.
For example, the following process contains a DataFlow Invoker step that uses RushScript to make calls to the DataFlow library JavaScript API:
1. An organization has customer data in a delimited ASCII file that they want to profile to automatically discover the values that appear frequently. The DataFlow Invoker step calls the DataFlow API using a custom script that uses the Summary Statistics operator (see the Using the Summary Statistics Operator topic in the DataFlow documentation). The output of the DataFlow Invoker step generates an XML file that contains the statistics.
2. The XML file is converted to delimited ASCII format using a Transformation step.
3. The Transformation step generates an Excel file that contains the most frequent values in every column, with the count of the occurrence.
Preparing to Use DataFlow Invoker
You must be familiar with the following concepts before using the DataFlow Invoker.
Concept
Description
DataFlow technology
The framework on which workflows are built is called DataFlow. For more information about this technology and its associated terminology, see the DataFlow documentation.
DataFlow workflow
You use the DataFlow Invoker to run a DataFlow workflow, also called as a graph.
Building a process
A DataFlow workflow is run as a step in a process. For more information, see Building Processes.
Adding a license
To create workflows using the DataFlow Invoker, the integration platform license must include the Engine DataFlow feature. See Uploading License File.
Using attachments
You can use a RushScript, Java class, or JSON file project attachment as the file to run.
Using the DataFlow command line
The DataFlow Invoker works by calling the DataFlow command line tool. See the Running from Command Line topic in the DataFlow documentation.
Prerequisites
Before using the DataFlow invoker, perform the following:
Install DataFlow: Make sure you have contacted your account executive to obtain the items required to set up Actian DataFlow.
Access Job Files: Make sure you have JSON, Java class files, or JavaScript files to run the workflow.
Note:  If using the method to access the DataFlow libraries involves exporting a JSON file, then an exported RushAnalytics file (.dr) is a JSON file.
Access Data Files: Make sure you have the source data files you want to use in the process.
Test the command line: The DataFlow Invoker works by calling the command line tool. When testing the command line, make sure that you are logged in to the system as a user that has the required permissions to read the DataFlow Job Files and source data, and write the target data.
Installing DataFlow and Specifying DR_LOCATION Macro
To install DataFlow, see the DataFlow documentation available at docs.actian.com.
After installing DataFlow, create the DR_LOCATION macro that points to the DataFlow installation location. For example, C:\Actian\dataflow-6.x-x. More information about the DR_LOCATION macro are provided later in this topic (see the DataFlow Install Directory property description.) This macro must be added to one of the following:
Macro set that is included in all projects that use the invoker
GLOBAL macro set: Can be used by all processes that call the DataFlow Invoker
Setting up DataFlow Command Line to Run DataFlow Invoker
To set up the DataFlow command line:
1. Extract the contents of the DataFlow 6.x SDK to the Actian installation directory (or a preferred location). The extracted folder contains the required files. For more information, see the Installing and Configuring DataFlow section in the DataFlow documentation.
2. (Optional) Obtain the DataFlow license (DataRush.slc) and add it at the root of the DataFlow installation directory. If you do not have the license, contact Actian Support.
Note:  This is required if you do not have the DataFlow feature included in the Actian DataConnect license.
3. (Optional) To use DataFlow from the command line, add <installdir>\dataflow-6.x\bin to the PATH environment variable.
Note:  To verify that you have correctly installed the DataFlow command line, run dr -v on the command line. The installed DataFlow version number is displayed. Also, make sure that Actian DataConnect has all DataFlow required environment variables set (includes JAVA_HOME). See the Configuring the Installation section of the Installing DataFlow for Use with Java in DataFlow documentation.
DataFlow Licensing
If license file was not installed when DataFlow was installed, use the $DR_LICENSE_LOCATION macro and specify to the directory containing one or more license files. In this case, DataFlow reads all the license file in the directory.
Configuring DataFlow Invoker
You can specify the following properties when you create an instance of this invoker component in the process file > Configuration tab > Message Components section.
 
Property
Description
DataFlow Install Directory
Indicates the DataFlow installation location. By default, $(DR_LOCATION) is displayed.
If this macro does not exist, the following error message is displayed:
$(DR_LOCATION) macro is not defined. $(DR_LOCATION) macro should be defined with location of DataFlow install.
Note:  It is possible to have multiple DataFlow installations on the same system. In this case, you may need to use macro sets or overrides in the run-time configuration to set $(DR_LOCATION) to the appropriate DataFlow installation for your process.
Supported Actions
Action
Description
Execute
Executes the supported properties.
Supported Action Parameters
Not applicable
Supported Action Properties
The following table provides the properties that the Execute action supports.
Property
Description
DataFlow Job File
JavaScript file that must be executed. Multiple values are separated with the system default path separator (a colon (:) on Linux; a semicolon (;) on Windows).
If executing a Java class, then specify the fully qualified name of the Java class file to be executed. For example, com.example.MyCustomClass.
You can also specify the class as an attachment. For details, see the Class Path property description.
Run a JSON Graph
Enable JSON file execution. By default, it is Disable.
Execute a Java Class
Enable Java DataFlow class execution. By default, it is Disable.
Working Directory
Working directory location where, if relative paths are used in the invoker configuration or the workflow to run, they are resolved relative to this location.
Note:  UNC paths are not supported.
Class Path
File paths of additional .jar files to be loaded by the Java Virtual Machine before execution. Multiple values are separated with the system default path separator (a colon (:) on Linux; a semicolon (;) on Windows).
Java Arguments
A string containing the JVM arguments that DataFlow invoker must use.
Character Set
Character set required to read JavaScript files.
Note:  This property is not displayed if Run a JSON Graph or Execute a Java Class is enabled.
Cluster Execution
Cluster configuration required to run the workflow. Specify the value as a macro or in the following format:
dr: //host:IP port number
where,
host is the host name or IP address of the server running the cluster manager
port is the specified port number of the cluster manager.
Engine Configuration
Engine configuration type and settings that you can pass to DataFlow.
Note:  This property is not displayed if Execute a Java Class is enabled.
Include Directories
Comma separated list of folders containing scripts to be loaded before job execution.
Note:  This property is not displayed if Run a JSON Graph or Execute a Java Class is enabled.
JavaScript Environment Variables
List of variables that are used in the RushScript job files. Specify the variables in the following format:
variable1=value1[, variable2=value2]
Note:  This property is not displayed if Run a JSON Graph or Execute a Java Class is enabled.
Import Macros
Enable importing of all Actian DataConnect macro definitions. By default, it is Disable.
Note:  This property is not displayed if Run a JSON Graph or Execute a Java Class is enabled.
Strict Mode
Set JavaScript strict checking mode to one of following:
Disabled
Warning (default)
Error
Note:  This property is not displayed if Run a JSON Graph or Execute a Java Class is enabled.
Properties Override File
Specify the properties file that contains the operator overrides.
Note:  This property is displayed only when Run a JSON Graph is enabled.
Override Operator Properties
String of source and target outputs (separated by a comma (no spaces)), that will be overridden in the JSON graph. Also, node name should not contain any spaces.
Note:  This property is displayed only when Run a JSON Graph is enabled.
You can consider using this feature if the paths on the RushAnalytics development machine do not match the paths for the job files. When the paths do not match, you can use override operator properties to adjust the locations or manually edit the exported .dr file.
The following are examples of supported syntax for this property:
SourceNodeName.source=path/file
TargetNodeName.target=path/file
 
Example:
SourceNodeName.source=C:\DR_SHARED_STORAGE\Input.txt, TargetNode1Name.target=C:\DR_SHARED_STORAGE\Output1.txt, TargetNode2Name.target=C:\DR_SHARED_STORAGE\Output2.txt
Note:  For formatting purposes, spaces are included after the commas, but in actual use, no spaces are allowed.
Tip...   
Use the naming conventions for the sources and targets.
Add the source and target entries to the string in the order in which they appear in the JSON (.dr) file.
Do not use spaces in source and target names or paths.
Use Cases
A few use cases are:
Create a workflow that calls the DataFlow API using custom JavaScript (the Run a JSON Graph and Execute a Java Class properties must be set to Disable).
Execute a workflow with a JSON file that has been exported from RushAnalytics (the Run a JSON Graph property is set to Enable).
Note:  Make sure the version of the DataFlow SDK matches the version used by RushAnalytics.
Create a workflow that calls the DataRush API using a custom Java class (the Execute a Java Class property is set to Enable).
Troubleshooting Tips
If you are having trouble executing your job, make sure that:
Actian DataConnect has permission to access all the files and folders used in the properties.
DataFlow is set up properly.
If you want to know the specific errors, check the log.
To debug a process run that uses the DataFlow Invoker step:
1. Set the Logging Level in the run-time configuration to DEBUG.
2. Save and run the configuration.
3. Select the Log tab in the console.
4. In the Find field, search for the word "firing". You should see a single instance of the word, which points to the command run on the command line.
5. Check that this command line works outside the DataFlow Invoker, in a command shell.
Error Codes
Error Code
Name
Description
Possible Reason
33
BADOPTIONVAL-UE
An invalid option value was used.
An invalid value is used for an option in the session properties.
46
LICENSING
A valid product license was not found.
A valid license is not available for this component and DataFlow is not available.
50
UNSPECIFIED
An Unknown error occurred while loading or executing the component.
For details, see the process log file.
Last modified date: 02/01/2024