Compilation Output
The following sample provides the process during the compilation phase. It also shows that the physical execution plan is not a direct representation of the logical graph.
The following provides important information about the sample:
• A logical graph defines what is achieved with the source data and not the method to achieve it. As shown in the sample, additional operators can be included to the physical plan that were not included in the original design. For example, the Sort operator can be included. Sometimes, the framework may provide sorted data to an operator according to the metadata of the operator. The framework includes the Sort operator as required.
• The execution plan is parallelized computationally and also the I/O is parallelized. Input files are divided into large-sized blocks to read in parallel. The parsed data from these blocks are included downstream in parallel.
• Operators that support parallel execution are replicated to create parallel streams of execution.
• The DataFlow compiler may divide an application into multiple phases of execution. The above sample shows the two phases (Phase 1 and Phase 2). The compiler adds the operators to stage the parallel streams of data to disk and prepares them for the next phase. The next phase adds the operators to read the data from staging before continuing the process. The graph is divided into multiple phases to process the request of an operator using its metadata to partition the input data.
• This concept is applied to execute a graph locally on a single system or distributed execution in a cluster environment.
Note: It is not required to compile a LogicalGraph explicitly. If you are not required to set the engine properties and access the LogicalGraphInstance, then you can execute the LogicalGraph directly using the run() method. The run() method compiles the graph before executing it.