Compiling an Application
After a DataFlow application is composed, the
LogicalGraph is ready for compiling.
Compiling a LogicalGraph creates an execution (or physical) plan that determines the method to execute the graph in the DataFlow engine environment. Generating the execution plan is based on the environment where the application will be executed. For example, the execution plan will be based on whether the application is run on a single system or in a cluster.
The following provides details on compiling an application:
• There are configuration settings for the DataFlow engine that affect generating the execution plan. For example, the parallelism setting of the engine determines the number of parallel streams of execution that must be created. For information about the different engine configuration settings and their effect on compiling and executing the graph, see Engine Configuration Settings.
The following code sample provides the procedure to build an engine configuration by setting the desired parallelism. The configuration is passed during compiling and affects generating the execution plan directly. The compiling of the application returns a LogicalGraphInstance. A LogicalGraph Instance is a representation of the LogicalGraph with additional run time elements that can be used to monitor the application when it is executed.
EngineConfig config = EngineConfig.engine().parallelism(2).monitored(true);
LogicalGraphInstance instance = graph.compile(config); instance.run();
The first line of code in the above sample creates a default engine configuration using the static EngineConfig.engine() method. After a configuration is created, different engine settings such as parallelism and monitoring can be applied.
As shown in the example, the methods to fix the engine settings always return a new engine configuration to stack them.
• The engine configuration can be passed to the compile() method of the LogicalGraph instance. This method considers the engine configuration when compiling the LogicalGraph to a physical plan. This compiling returns a LogicalGraphInstance. This is not a physical plan, but another representation of the logical application. This is the actual representation of the operators that are added to the LogicalGraph and connected together. When compiling, the high level operators may be divided into components. Without a logical representation, the original design of the application is lost, and this makes the profiling and debugging difficult. The LogicalGraphInstance maintains the mapping from the logical space of the original design to the physical space of the method to execute the application.