Cohabitation of Vector and Spark-Vector under YARN
If VectorH is running with YARN enabled, we recommend the following when using the Spark-Vector Provider:
• Use two queues under the same parent queue, for example, queue Actian with two children ActianVector and ActianSpark. (These names are used below.)
• Make the ActianVector queue low capacity but set the user limit factor enough for both WSetAppMaster and WorkloadAppMaster to (preferably) fit completely within the configured maximum capacity for Actian queue.
• Make the ActianSpark queue higher capacity than ActianVector and set the user limit factor by considering the max parallelism you eventually get from this job and that WSetAppMaster can be killed if the VectorH external query job is too eager. That is, make sure that, with the user limit factor being set, the ActianSpark queue cannot overlap VectorH resources for WSetAppMaster.
• Set "spark.dynamicAllocation.enabled true" and "spark.shuffle.service.enabled true" so that a spark-shell will not take static resources during initialization.
Note: These options must be set; otherwise Vector will not ignore the incoming preemption requests because no external query would be running in the system at that time.
• Because of the above dynamic allocation do not try to set any num-executors, initial-executors, or min-executors type of options; use only --executor-memory Xg ‑‑executor-cores Y in case you need to increase the resources of one container/executor due to big workloads. (If not, you might get HeapOverflows.)
This setup should allow spark jobs running in the ActianSpark queue to request how much they want (with respect to the user limit factor) from other available queues in the system, such as ActianVector, in which Vector runs with preemption enabled and [dbagent] parameters set for required resources. If it happens to take resources from ActianVector under the above circumstances, Vector would be able to ignore them and run out-of-band (with unchanged max-parallelism) as long as it takes for the load or unload job to finish.
Note: The one-queue (only Actian queue) scenario does not work, neither when both Vector (being the first job and getting the queue's full set of resources) and Spark (being the second incoming job) jobs are running on behalf of different users (though this will not be the typical install environment), nor when Actian user is the only one (in this case the Spark job does not even pass the running phase of the ApplicationMaster, hence it hangs indefinitely).