Vector 6.2 | How to Add Extra Data Sources

User Guide > User Guide > B. Setting Up Spark for Use with Vector > How to Add Extra Data Sources

Was this helpful?

How to Add Extra Data Sources

By default, the Spark-Vector Provider supports only the Spark integrated data sources (such as JDBC, JSON, Parquet) and CSV data sources (the Spark-Vector Provider is bundled with spark-csv 1.4.0).

Follow this process to add extra data sources or packages:

1. Modify $II_SYSTEM/ingres/files/spark-provider/spark_provider.conf (as shown in the following examples).

2. Stop and start the Spark-Vector Provider to put the changes into effect, as follows:

ingstop -spark_provider

ingstart -spark_provider

Here are examples of modifying $II_SYSTEM/ingres/files/spark-provider/spark_provider.conf to add extra data sources:

• To add extra JARs, add the line:

spark.jars comma-separated-list-of-jars

• To add extra packages, add the line:

spark.jars.packages comma-separated-list-of-packages

For example, to enable support for Cassandra (spark-cassandra) and Redshift (spark-redshift), add the line:

spark.jars.packages datastax:spark-cassandra-connector:1.4.4-s_2.10,com.databricks:spark-redshift_2.10:0.6.0

Note: For Spark 1.5, to preserve a default spark configuration (for example, /etc/spark/conf/spark-defaults.conf), it must be included in $II_SYSTEM/ingres/files/spark-provider/spark_provider.conf.

• To add support for reading and writing AVRO files with Spark 1, add the line:

spark.jars.packages=com.databricks:spark-avro_2.10:2.0.1

If using Spark 2 add the line:

spark.jars.packages=com.databricks:spark-avro_2.11:3.1.0

Last modified date: 04/23/2025