Executing Jobs with the Java API
To execute DataFlow jobs on a cluster, provide a cluster specifier as an engine setting to launch a job.
For more information about cluster specifiers and other engine settings, see
Engine Configuration Settings.
To execute a job on a Hadoop cluster using YARN, the scheme of the cluster specifier URL should be set to yarn.
The following code fragment shows how to set the cluster specifier for a YARN-based cluster.
Note: The port number used in the cluster specifier references the --cluster.port number used to start the DataFlow Cluster Manager.
Launching a Dataflow Job using YARN
// Create a DataFlow logical graph
LogicalGraph graph = LogicalGraphFactory.newLogicalGraph("Yarn job ");
// Compose the graph with operators ... (not shown)
// Specify the Hadoop module to load. Corresponds to the version of Hadoop running.
ModuleConfiguration moduleConfig = ModuleConfiguration.modules(new String[] {"datarush-hadoop-apache2"});
// Create an engine configuration with the wanted module configuration
// and specifying the target cluster. Note the use of "yarn:".
EngineConfig engConfig = EngineConfig.engine().moduleConfiguration(moduleConfig).cluster("yarn://devcluster-head.datarush.local:47000").monitored(true);
// Compile the graph with the specified configuration and run it on the target cluster.
graph.compile(engConfig).run();
Last modified date: 06/14/2024