DataFlow Preferences in KNIME

Property	Default Value	Description
Parallelism	0	Specifies the parallelism setting.
Minimum parallelism	0	Specifies the minimum value of parallelism required to execute a job. This property is used when executing a job on a busy cluster. If the required value of parallelism is not reached with the current resources, then the minimum parallelism setting is used to determine the minimum value that is acceptable. To ensure that the accurate number of partitions of parallelism are allocated, set the same value for both minimum parallelism and parallelism levels.
Execute in cluster	false	Specifies whether the workflow must be executed on a cluster or locally. Set the value to true to execute on a cluster or false to execute it locally.
Cluster URL	dr://localhost:1099	Specifies the URL, such as dr://hostname:1099, of the cluster, which is used to execute the workflow. When integrating with Apache YARN, use yarn as the scheme of the cluster URL. For example: yarn://hostname:1099.
Scheduler queue	default	Specifies the name of scheduler queue. This is used when scheduling the jobs. The scheduler queue name is valid only when a cluster is used for job execution. Currently, scheduler queue names are supported only if YARN is used to execute a job.
User extension paths (comma delimited)	not applicable	Specifies the list of archived files used by the workflow that runs on a YARN cluster. This is required when you need additional files such as customer components, jar files, and scripts for execution, and these files must be distributed across all the nodes used to run the DataFlow job.
Use socks proxy	false	Specifies whether to use the socks proxy to contact the cluster. This property sets the socket provider with the specified host and port. Set the value to true to use the socks proxy.
Socks proxy host	localhost	Specifies the socks proxy host name.
Socks proxy port	1080	Specifies the socks proxy port number.
Maximum number of retries	0	Specifies the maximum number of retries for network communication. The default value (0) does not allow retries.
Writeahead	2	Specifies the number of unread batches that a port can publish before blocking.
Spooling threshold	0	Specifies the threshold for the queue. If the queue exceeds the threshold limit, it is routed to the disk until it shrinks. This saves memory. The default value (0) indicates that the system should set a valid threshold based on the available resources.
Batch size	1024	Specifies the batch size that the operator ports should use to publish the data tokens that are written. Data are published for readers until a full batch is ready or the end of data is reached.
Collect engine statistics	true	Specifies whether to collect the engine statistics. Set the value to true for the workflow to collect the engine statistics.
Autosize writeahead (based on number of readers)	true	Specifies whether to auto-size the write ahead based on the number of readers. Set the value to true to auto-size the write ahead.
Subgraph history size	10	Specifies the maximum number of subgraphs to store in memory while tracking the execution history. By default, the last 10 subgraphs for any given operator are stored.
Storage management path	not applicable	Specifies the path to store the temporary and intermediate files during graph execution.
Visualization sampling percentage	0.0	Specifies the percentage of total data in the workflow that is sampled for use with monitoring plugins.
Visualization sampling seed	37	Specifies the seed that should be used when determining the data for sampling. Setting the sampling percentage to 100 has no effect.
Sort buffer	10M	Specifies the size of the buffer used to sort the data in memory.
Sort IO Buffer	0	Specifies the buffer size (in bytes) used for all I/O operations initiated by the Sort operator. The value is in kilobytes (k), megabytes (m), or gigabytes (g). The default size (0) indicates that the framework should set the buffer size at run time based on the current resource availability.
Sort Max Merge	0	Specifies the maximum number of intermediate sort segments to merge concurrently. Adjust this setting to control the resources required to merge sorted segments and create the final sorted result.
Dump file path	not applicable	Sets the dumpFilePath engine configuration to the specified value.