How External Tables Work

For the external table functionality, Vector leverages Apache Spark's extensive connectivity through the Spark-Vector Connector.

External Tables architecture is composed of two main components:

Vector receives queries operating on External Tables from the user, rewrites them into JSON requests for external data, which are sent to the Spark-Vector Provider. The Spark-Vector Provider is a Spark application that behaves as a multi-threaded Spark server. It receives requests from Vector, translates them into Spark jobs, and launches them. These jobs typically issue queries (to SparkSQL) like "INSERT INTO vector_table SELECT * FROM external_resource" for reading external data or "INSERT INTO external_resource SELECT * FROM vector_table" for writing to external systems. Finally, these jobs use the Spark-Vector Connector to push and pull data in and out of Vector.