Vector 7.0 | External Table Requirements

User Guide > User Guide > Using External Tables > External Table Requirements

Was this helpful?

External Table Requirements

The following requirements must be met to use external tables:

• The Spark container environment must be installed. This is possible by running the iisuspark script or by specifying -sparkdownload with the install.sh command.

Note: When using a response file, it is possible to set the II_DOWNLOAD_SPARK parameter to yes.

For more information, see Setting Up Spark for Use with Vector.

To access files stored on the local file system, the folder must be mounted into the container.

This can be configured by setting the configuration variable ii.<host>.spark_provider.user_mount which is initially set to “none” (no access to the local file system).

• To use the directory /data/external_tables, you can set the variable using the following command:

iisetres "ii.`iipmhost`.spark_provider.user_mount" /data/external_tables

• To access the directory in a read-only mode use the following:

iisetres "ii.`iipmhost`.spark_provider.user_mount" /data/external_tables:readonly

• To check the current setting use the following:

iigetres "ii.$.spark_provider.user_mount"

• To disable the access to the local file system again set the configuration to none:

iisetres "ii.`iipmhost`.spark_provider.user_mount" none

• Restart the instance to implement the changes using ingstop and ingstart as the installation owner or DBA.

The contents of the mounted folder are located under /opt/user_mount in the container file system. To reference a file in your local file system, e.g., testfile.csv, use /opt/user_mount/testfile.csv as the path.

The Spark provider container ships the storage drivers for AWS, GCS and Azure which require configuration. Full configuration needs to be done in $II_SYSTEM/ingres/files/spark-provider/spark_provider.conf file. There is no need to create additional files as mentioned in other documentation, e.g. spark-defaults.conf.

To access the s3a files, see:
https://hadoop.apache.org/docs/r3.3.6/hadoop-aws/tools/hadoop-aws/index.html

To configure GCS access, see:
https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.21/gcs/INSTALL.md

For Azure access, see:
https://hadoop.apache.org/docs/r3.3.6/hadoop-azure/index.html

Note: All configuration keys need to be prefixed with spark.hadoop, e.g. fs.s3a.secret.key -> spark.hadoop.fs.s3a.secret.key

Last modified date: 01/10/2025