Vector 6.0 | CREATE EXTERNAL TABLE

SQL Language Guide > SQL Language Guide > SQL Statements > CREATE EXTERNAL TABLE

Was this helpful?

CREATE EXTERNAL TABLE

Valid in: SQL, ESQL, OpenAPI, ODBC, JDBC, .NET

The CREATE EXTERNAL TABLE statement maps the structure of a data file created outside of Vector to the structure of a Vector table. The data can then be queried from its original locations.

This statement has the following format:

CREATE EXTERNAL TABLE [IF NOT EXISTS] [schema.]table_name

(column_name data_type {,column_name data_type})

USING provider_name

WITH REFERENCE='reference'

[,FORMAT='format']

[EXTERNAL_ROLE='external_role']

[,OPTIONS=('key'=value {,'key'='value'})]

IF NOT EXISTS

Creates the table if it does not exist and returns without error if the table already exists.

Note: Use this option with caution. The table definition of a pre-existing table may differ with that of the CREATE EXTERNAL TABLE IF NOT EXISTS statement.

table_name

Defines the name of the external table to be created.

column_name data_type

Specifies the column name and data type of each column. Each column specification must be separated with a comma.

USING provider_name

Specifies the name of the provider. The only valid provider is SPARK.

Provider Name Application

SPARK Spark

WITH REFERENCE='reference'

(Required) Specifies the reference to the external datasource. This corresponds to the parameter passed to the load method of DataFrameReader or save method of DataFrameWriter. For example:

For database systems: The table name

FORMAT='format'

(Optional) Is a WITH clause option that specifies the format of the external data.

Example formats include: csv, avro, parquet, hive, orc, json, jdbc. For other datasources, format corresponds to the class name that defines that external datasource. For example, for Redshift it would be com.databricks.spark.redshift. This corresponds to the parameter passed to the format method of DataFrameReader/Writer.

When FORMAT is not specified, the Spark-Vector Provider tries to recognize the format for files by looking at the file extension.

EXTERNAL_ROLE='external_role'

Note: This option is not implemented.

OPTIONS

(Optional) Is a WITH clause option that specifies user defined options for the datasource read or written to. This corresponds to the options method of the DataFrameReader/Writer. For example, for CSV files you can pass any options supported by spark-csv. For additional options supported by various external datasources, see the DataFrameReader documenation at https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html.

Useful options are:

'HEADER'='TRUE' | 'FALSE'

Indicates whether the data file contains a header row.

Note: When referencing a CSV file, a header is included by default.

'DELIMITER'='character'

Indicates the character used in the data file as the record delimiter. Example: 'delimiter'='|'

'SCHEMA'='schema'

Specifies the table column definitions, which are required if the data file being loaded does not contain a header row.

Last modified date: 11/09/2022