Using DataFlow Cluster Manager : Executing DataFlow Jobs Using YARN : Executing Jobs with the Java API
 
Share this page                  
Executing Jobs with the Java API
To execute DataFlow jobs on a cluster, provide a cluster specifier as an engine setting to launch a job.
To learn more about cluster specifiers and other engine settings, see Engine Configuration Settings.
To execute a job on a Hadoop cluster using YARN, the scheme of the cluster specifier URL should be set to yarn.
The following code fragment shows how to set the cluster specifier for a YARN-based cluster.
Note:  The port number used in the cluster specifier references the --cluster.port number used to start the DataFlow Cluster Manager.
Launching a Dataflow Job using YARN
// Create a DataFlow logical graph
LogicalGraph graph = LogicalGraphFactory.newLogicalGraph("Yarn job ");

// Compose the graph with operators ... (not shown)

// Specify the Hadoop module to load. Corresponds to the version of Hadoop running.
ModuleConfiguration moduleConfig = ModuleConfiguration.modules(new String[] {"datarush-hadoop-apache2"});

// Create an engine configuration with the wanted module configuration
// and specifying the target cluster. Note the use of "yarn:".
EngineConfig engConfig = EngineConfig.engine().moduleConfiguration(moduleConfig).cluster("yarn://devcluster-head.datarush.local:47000").monitored(true);

// Compile the graph with the specified configuration and run it on the target cluster.
graph.compile(engConfig).run();