User Guide > User Guide > Loading Data > Load Data with Spark SQL through the Spark-Vector Connector
Was this helpful?
Load Data with Spark SQL through the Spark-Vector Connector
The Spark-Vector Connector lets you interact with a Vector database using Apache Spark. A functional setup requires a working Spark installation and the Spark Vector Connector jar file. Please contact Actian support for a copy of the Spark Vector Connector jar and installation requirements for Spark.
In the following example, we assume that the user is connected to a spark-shell. Please ensure that the Spark Vector Connector jar file is in the classpath. A new Spark session must be created in the following manner:
import org.apache.spark.sql._
import com.actian.spark_vector.extensions.VectorDataSourceV2Strategy
val spark = SparkSession.builder().withExtensions { extensions =>
extensions.injectPlannerStrategy(sp => new VectorDataSourceV2Strategy(sp))
}.getOrCreate()
Assuming that you have created a table in Vector as follows:
CREATE TABLE test(col1 int)
You can reference this table in Spark as follows:
spark.sql("""CREATE TEMPORARY VIEW vector_table
USING com.actian.spark_vector.sql.VectorSourceV2
OPTIONS (
host "localhost",
instance "VW",
database "testdb",
table "test",
user "actian",
password "actian"
)""")
You can load data into the table as shown below:
import org.apache.spark.sql.types.IntegerType
import org.apache.spark.sql.types.StructField
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.Row
import scala.jdk.CollectionConverters._

val values: Seq[Row] = Seq(Row(1))
val schema = StructType(Seq(StructField("col1", IntegerType, nullable = true)))
val valuesDF: DataFrame = spark.createDataFrame(values.asJava, schema)
valuesDF.createTempView("spark_table")
 
spark.sql("insert into vector_table select * from spark_table")
To view the inserted data:
val res = spark.sql("select * from vector_table")
res.show()
Last modified date: 12/19/2024