Was this helpful?
DataFlow Invoker
The DataFlow Invoker allows you to execute DataFlow scripts and JSON graphs within a DataConnect process. The DataFlow Invoker uses the DataFlow command line to call and run an existing DataFlow application. DataFlow Invoker supports datarush-library API 8.x0x.
Note:  Actian DataConnect supports DataFlow 8.0 and the recommended Java version for DataFlow 8.0 is Java 11.
Using RushScript to make calls to the DataFlow library JavaScript API (see RushScript topic in the DataFlow documentation.)
Using a custom Java class to make calls to the DataFlow library Java API (see the DataFlow API Usage topic in the DataFlow documentation.)
Exporting a workflow from RushAnalytics and executing that workflow with the DataFlow Invoker as a JSON file (see the Enabling Workflows to Execute with DataFlow topic in the DataFlow documentation.)
For more details about building workflows, see the Building DataFlow Applications in Java topic in the DataFlow documentation.
For example, the following process contains a DataFlow Invoker step that uses RushScript to make calls to the DataFlow library JavaScript API:
1. An organization has customer data in a delimited ASCII file that they want to profile to automatically discover the values that appear frequently. The DataFlow Invoker step calls the DataFlow API using a custom script that uses the Summary Statistics operator (see the Using the Summary Statistics Operator topic in the DataFlow documentation). The output of the DataFlow Invoker step generates an XML file that contains the statistics.
2. The XML file is converted to delimited ASCII format using a Transformation step.
3. The Transformation step generates an Excel file that contains the most frequent values in every column, with the count of the occurrence.
Preparing to Use DataFlow Invoker
You must be familiar with the following concepts before using the DataFlow Invoker.
Concept
Description
DataFlow technology
The framework on which workflows are built is called DataFlow. For more information about this technology and its associated terminology, see the DataFlow documentation.
DataFlow workflow
You use the DataFlow Invoker to run a DataFlow workflow, also called as a graph.
Building a process
A DataFlow workflow is run as a step in a process. For more information, see Building Processes.
Adding a license
To create workflows using the DataFlow Invoker, the integration platform license must include the Engine DataFlow feature. See Uploading License File.
Using attachments
You can use a RushScript, Java class, or JSON file project attachment as the file to run.
Using the DataFlow command line
The DataFlow Invoker works by calling the DataFlow command line tool. See the Running from Command Line topic in the DataFlow documentation.
Prerequisites
Before using the DataFlow invoker, perform the following:
Install DataFlow: Make sure you have contacted your account executive to obtain the items required to set up Actian DataFlow.
Access Job Files: Make sure you have JSON, Java class files, or JavaScript files to run the workflow.
Note:  If using the method to access the DataFlow libraries involves exporting a JSON file, then an exported RushAnalytics file (.dr) is a JSON file.
Access Data Files: Make sure you have the source data files you want to use in the process.
Test the command line: The DataFlow Invoker works by calling the command line tool. When testing the command line, make sure that you are logged in to the system as a user that has the required permissions to read the DataFlow Job Files and source data, and write the target data.
 
Installing DataFlow and Specifying DR_LOCATION Macro
To install DataFlow, see the DataFlow documentation available at docs.actian.com.
After installing DataFlow, create the DR_LOCATION macro that points to the DataFlow installation location. For example, C:\Actian\dataflow-8.x-x. More information about the DR_LOCATION macro are provided later in this topic (For more information, see DataFlow Invoker Properties.) This macro must be added to one of the following:
Macro set that is included in all projects that use the invoker
GLOBAL macro set: Can be used by all processes that call the DataFlow Invoker
 
Setting up DataFlow Command Line to Run DataFlow Invoker
To set up the DataFlow command line:
1. Extract the contents of the DataFlow 8.x SDK to the Actian installation directory (or a preferred location). The extracted folder contains the required files. For more information, see the Installing and Configuring DataFlow section in the DataFlow documentation.
2. (Optional) Obtain the DataFlow license (DataRush.slc) and add it at the root of the DataFlow installation directory. If you do not have the license, contact Actian Support.
Note:  This is required if you do not have the DataFlow feature included in the Actian DataConnect license.
3. (Optional) To use DataFlow from the command line, add <installdir>\dataflow-8.x\bin to the PATH environment variable.
Note:  To verify that you have correctly installed the DataFlow command line, run dr -v on the command line. The installed DataFlow version number is displayed. Also, make sure that Actian DataConnect has all DataFlow required environment variables set (includes JAVA_HOME). See the Configuring the Installation section of the Installing DataFlow for Use with Java in DataFlow documentation.
DataFlow Licensing
If license file was not installed when DataFlow was installed, use the $DR_LICENSE_LOCATION macro and specify to the directory containing one or more license files. In this case, DataFlow reads all the license file in the directory.
DataFlow Invoker Properties
Property Name
Description
DataFlow Install Directory
Indicates the DataFlow installation location. By default, $(DR_LOCATION) is displayed.
If this macro does not exist, the following error message is displayed:
$(DR_LOCATION) macro is not defined. $(DR_LOCATION) macro should be defined with location of DataFlow install.
Note:  It is possible to have multiple DataFlow installations on the same system. In this case, you may need to use macro sets or overrides in the run-time configuration to set $(DR_LOCATION) to the appropriate DataFlow installation for your process.
Supported Actions
Action
Description
Execute
Executes the supported properties.
Supported Action Parameters
Not applicable
Supported Action Properties
The following table provides the properties that the Execute action supports.
Property Name
Description
DataFlow Job File
DataFlow script or graph to execute. Multiple scripts or graphs can be specified using comma (,).
If executing a Java class, then specify the fully qualified name of the Java class file to be executed. For example, com.example.MyCustomClass.
You can also specify the class as an attachment. For details, see the Class Path property description.
Run a JSON Graph
Enable or Disable the execution of a JSON graph. Default value is Disable.
Note:  This property is not displayed if Execute a Java Class is enabled.
Execute a Java Class
Enable or Disable the execution of a Java DataRush class. Default value is Disable.
Note:  This property is not displayed if Run a JSON Graph is enabled.
Working Directory
Specifies the base working directory. Job files are relative to this directory.
Note:  UNC paths are not supported.
Class Path
Path to the project jar or directory.
Java Arguments
Java Virtual Machine arguments.
Character Set
Uses the provided character set when reading the script files to execute. For example, ASCII, UTF-8, and so on. Default value is UTF-8.
Note:  This property is not displayed if the Execute a Java Class property is enabled.
Cluster Execution
DataFlow can be executed either locally or in a cluster. If you are running within a cluster, you’ll need to specify the location of the master host. There are two ways to accomplish this:
1. Specify the cluster using the URL format:dr://host:port where,
host is the host name or IP address of the server running the cluster manager.
port is the specified port number of the cluster manager.
2. Integrate the cluster with hadoop format:yarn://host:port
Engine Configuration
Sets the engine configuration properties. You can provide a list of comma-separated values. Any property that is defined at execution will override the embedded property.
Example: parallelism=1
Note:  This property is not displayed if Execute a Java Class is enabled.
Include Directories
Comma separated list of folders containing scripts to be included before job execution. When set, this property ensures the java script file will find all the dependent scripts.
Note:  This property is not displayed if Run a JSON Graph or Execute a Java Class is enabled.
JavaScript Environment Variables
Set a variable in the JavaScript environment. Specify the variables in the following format:
variable1=value1[, variable2=value2]
Note:  This property is not displayed if Run a JSON Graph or Execute a Java Class is enabled.
Import Macros
When enabled, import macros will automatically substitute macros embedded in your java script file.
Default value is Disable.
Note:  This property is not displayed if the Run a JSON Graph or Execute a Java Class property is enabled.
Strict Mode
Set JavaScript strict checking mode to one of following:
Disabled
Warning (default)
Error
Note:  This property is not displayed if Run a JSON Graph or Execute a Java Class is enabled.
Properties Override File
Specifies the properties file containing the operator overrides.
Note:  This property is displayed only when Run a JSON Graph is enabled.
Override Operator Properties
String of source and target outputs (separated by a comma (no spaces)), that will be overridden in the JSON graph. Also, node name should not contain any spaces.
Note:  This property is displayed only when Run a JSON Graph is enabled.
You can consider using this feature if the paths on the RushAnalytics development machine do not match the paths for the job files. When the paths do not match, you can use override operator properties to adjust the locations or manually edit the exported .dr file.
The following are examples of supported syntax for this property:
SourceNodeName.source=path/file
TargetNodeName.target=path/file
 
Example:
SourceNodeName.source=C:\DR_SHARED_STORAGE\Input.txt, TargetNode1Name.target=C:\DR_SHARED_STORAGE\Output1.txt, TargetNode2Name.target=C:\DR_SHARED_STORAGE\Output2.txt
Note:  For formatting purposes, spaces are included after the commas, but in actual use, no spaces are allowed.
Tip...  
- Use the naming conventions for the sources and targets.
- Add the source and target entries to the string in the order in which they appear in the JSON (.dr) file.
- Do not use spaces in source and target names or paths.
Use Cases
A few use cases are:
Create a workflow that calls the DataFlow API using custom JavaScript (the Run a JSON Graph and Execute a Java Class properties must be set to Disable).
Execute a workflow with a JSON file that has been exported from RushAnalytics (the Run a JSON Graph property is set to Enable).
Note:  Make sure the version of the DataFlow SDK matches the version used by RushAnalytics.
Create a workflow that calls the DataRush API using a custom Java class (the Execute a Java Class property is set to Enable).
Troubleshooting Tips
If you are having trouble executing your job, make sure that:
Actian DataConnect has permission to access all the files and folders used in the properties.
DataFlow is set up properly.
If you want to know the specific errors, check the log.
To debug a process run that uses the DataFlow Invoker step:
1. Set the Logging Level in the run-time configuration to DEBUG.
2. Save and run the configuration.
3. Select the Log tab in the console.
4. In the Find field, search for the word "firing". You should see a single instance of the word, which points to the command run on the command line.
5. Check that this command line works outside the DataFlow Invoker, in a command shell.
Error Codes
Error Code
Name
Description
Possible Reason
33
BADOPTIONVAL-UE
An invalid option value was used.
An invalid value is used for an option in the session properties.
46
LICENSING
A valid product license was not found.
A valid license is not available for this component and DataFlow is not available.
50
UNSPECIFIED
An Unknown error occurred while loading or executing the component.
For details, see the process log file.
Last modified date: 12/03/2024