Executing Jobs with the Java API

Using DataFlow > Using DataFlow > Using DataFlow Cluster Manager > Executing Jobs with the Java API

Was this helpful?

To execute DataFlow jobs on a cluster, provide a cluster specifier as an engine setting to launch a job.

For more information about cluster specifiers and other engine settings, see Engine Configuration Settings.

To execute a job on a Hadoop cluster using YARN, the scheme of the cluster specifier URL should be set to yarn.

The following code fragment shows how to set the cluster specifier for a YARN-based cluster.

Note: The port number used in the cluster specifier references the --cluster.port number used to start the DataFlow Cluster Manager.

Launching a Dataflow Job using YARN

// Create a DataFlow logical graph
LogicalGraph graph = LogicalGraphFactory.newLogicalGraph("Yarn job ");

// Compose the graph with operators ... (not shown)

// Specify the Hadoop module to load. Corresponds to the version of Hadoop running.
ModuleConfiguration moduleConfig = ModuleConfiguration.modules(new String[] {"datarush-hadoop-apache2"});

// Create an engine configuration with the wanted module configuration
// and specifying the target cluster. Note the use of "yarn:".
EngineConfig engConfig = EngineConfig.engine().moduleConfiguration(moduleConfig).cluster("yarn://devcluster-head.datarush.local:47000").monitored(true);

// Compile the graph with the specified configuration and run it on the target cluster.
graph.compile(engConfig).run();

Last modified date: 01/03/2025