External Table Requirements
The following requirements must be met to use external tables:
• The Spark container environment must be installed. This is possible by running the iisuspark script or by specifying -sparkdownload with the install.sh command.
Note: When using a response file, it is possible to set the II_DOWNLOAD_SPARK parameter to yes.
To access files stored on the local file system, the folder must be mounted into the container.
This can be configured by setting the configuration variable ii.<host>.spark_provider.user_mount which is initially set to “none” (no access to the local file system).
• To use the directory /data/external_tables, you can set the variable using the following command:
iisetres "ii.`iipmhost`.spark_provider.user_mount" /data/external_tables
• To access the directory in a read-only mode use the following:
iisetres "ii.`iipmhost`.spark_provider.user_mount" /data/external_tables:readonly
• To check the current setting use the following:
iigetres "ii.$.spark_provider.user_mount"
• To disable the access to the local file system again set the configuration to none:
iisetres "ii.`iipmhost`.spark_provider.user_mount" none
• Restart the instance to implement the changes using ingstop and ingstart as the installation owner or DBA.
The contents of the mounted folder are located under /opt/user_mount in the container file system. To reference a file in your local file system, e.g., testfile.csv, use /opt/user_mount/testfile.csv as the path.
The Spark provider container ships the storage drivers for AWS, GCS and Azure which require configuration. Full configuration needs to be done in $II_SYSTEM/ingres/files/spark-provider/spark_provider.conf file. There is no need to create additional files as mentioned in other documentation, e.g. spark-defaults.conf.
To access the s3a files, see:
https://hadoop.apache.org/docs/r3.3.6/hadoop-aws/tools/hadoop-aws/index.html
To configure GCS access, see:
https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.21/gcs/INSTALL.md
For Azure access, see:
https://hadoop.apache.org/docs/r3.3.6/hadoop-azure/index.html
Note: All configuration keys need to be prefixed with spark.hadoop, e.g. fs.s3a.secret.key -> spark.hadoop.fs.s3a.secret.key
Last modified date: 12/19/2024