15. Using External Tables : How to Add Extra Packages
 
Share this page                  
How to Add Extra Packages
By default, the Spark-Vector Provider supports only the Spark integrated data sources (such as JDBC, JSON, Parquet) and CSV data sources (the Spark-Vector Provider is bundled with spark-csv 1.4.0).
Follow this process to add extra data sources (packages):
1. Modify $II_SYSTEM/ingres/files/spark-provider/spark_provider.conf (as shown in the following examples).
2. Stop and start the Spark-Vector Provider to put the changes into effect, as follows:
ingstop -spark_provider
ingstart -spark_provider
Here are examples of modifying $II_SYSTEM/ingres/files/spark-provider/spark_provider.conf to add extra data sources:
To add support for reading and writing ORC files or to Hive tables, add the line:
spark.vector.provider.hive true
To add extra jars, add the line:
spark.jars comma-separated-list-of-jars
To add extra packages, add the line:
spark.jars.packages comma-separated-list-of-packages
For example, to enable support for Cassandra (spark-cassandra) and Redshift (spark-redshift), add the line:
spark.jars.packages datastax:spark-cassandra-connector:1.4.4-s_2.10,com.databricks:spark-redshift_2.10:0.6.0
Note:  For Spark 1.5, to preserve a default spark configuration (for example, /etc/spark/conf/spark-defaults.conf), it must be included in $II_SYSTEM/ingres/files/spark-provider/spark_provider.conf.
To add support for reading and writing AVRO files with Spark 1, add the line:
spark.jars.packages=com.databricks:spark-avro_2.10:2.0.1
If using Spark 2 add the line:
spark.jars.packages=com.databricks:spark-avro_2.11:3.1.0
Example external table definition for an AVRO data source:
CREATE EXTERNAL TABLE tweets
(username VARCHAR(20),
tweet VARCHAR(100),
timestamp VARCHAR(50))
USING SPARK
WITH REFERENCE='hdfs://blue/tmp/twitter.avro',
FORMAT='com.databricks.spark.avro'