How to Add Extra Data Sources
By default, the Spark-Vector Provider supports only the Spark integrated data sources (such as JDBC, JSON, Parquet) and CSV data sources (the Spark-Vector Provider is bundled with spark-csv 1.4.0).
Follow this process to add extra data sources or packages:
1. Modify $II_SYSTEM/ingres/files/spark-provider/spark_provider.conf (as shown in the following examples).
2. Stop and start the Spark-Vector Provider to put the changes into effect, as follows:
ingstop -spark_provider
ingstart -spark_provider
Here are examples of modifying $II_SYSTEM/ingres/files/spark-provider/spark_provider.conf to add extra data sources:
• To add extra JARs, add the line:
spark.jars comma-separated-list-of-jars
• To add extra packages, add the line:
spark.jars.packages comma-separated-list-of-packages
For example, to enable support for Cassandra (spark-cassandra) and Redshift (spark-redshift), add the line:
spark.jars.packages datastax:spark-cassandra-connector:1.4.4-s_2.10,com.databricks:spark-redshift_2.10:0.6.0
Note: For Spark 1.5, to preserve a default spark configuration (for example, /etc/spark/conf/spark-defaults.conf), it must be included in $II_SYSTEM/ingres/files/spark-provider/spark_provider.conf.
• To add support for reading and writing AVRO files with Spark 1, add the line:
spark.jars.packages=com.databricks:spark-avro_2.10:2.0.1
If using Spark 2 add the line:
spark.jars.packages=com.databricks:spark-avro_2.11:3.1.0
Last modified date: 06/28/2024