Was this helpful?
CREATE EXTERNAL TABLE Examples
Note:  The column names in the Vector external table must match the column names in the reference source.
1. Define an external table for a CSV data source:
CREATE EXTERNAL TABLE ext_csv (col1 INT4 NOT NULL, col2 VARCHAR(20) NOT NULL)
USING SPARK
WITH REFERENCE='user/mark/file.csv',
  OPTIONS=('DELIMITER' = '|', 'HEADER' = 'TRUE','SCHEMA'='col1 INT4 NOT NULL, col2 VARCHAR(20) NOT NULL');
Notes:
If the CSV file does not have a header, the name of the columns in the external table must match those in spark-csv, that is, C0, C1, .... If the option 'HEADER' = 'TRUE' is not specified and the Vector external table definition has different names, you will get an error like:
E_VW1213 External table provider reported an error
'org.apache.spark.sql.AnalysisException: cannot resolve 'a' given input columns C0, C1; line 1 pos 38'.
If external table columns are defined as NOT NULL, the table schema must be specified with NOT NULL in the SCHEMA option, even if header is set to true. This is because if Spark SQL implicitly determines the schema information from, for example, a CSV file, the nullability of columns typically defaults to true.
2. Define an external table for a CSV data source residing in Amazon S3. The CSV data does not have a header row:
CREATE EXTERNAL TABLE nation_s3 (
   n_nationkey INTEGER NOT NULL,
   n_name CHAR(25) NOT NULL,
   n_regionkey INTEGER NOT NULL
) USING SPARK WITH REFERENCE='s3a://<bucket>/nation.csv',FORMAT='csv',
OPTIONS=('HEADER'='FALSE','DELIMITER'='|','SCHEMA'='n_nationkey INT NOT NULL, n_name STRING NOT NULL, n_regionkey INT NOT NULL');
Note:  The column names in the schema must match the column names of the external table. The Spark data type specified in the schema must be compatible with the Vector data type for the matching column. See “Vector to Spark Data Type Mapping” in the User Guide.
3. Define an external table for a JSON data source:
Example JSON file:
[{
     "symbol": "MSFT",
     "company": "Microsoft Corporation",
     "stock_tstamp": "2020-01-31T21:00:00+00:00",
     "price": 170.777,
     "volume": 36142690,
     "sector": "TECH"
},
{
     "symbol": "AAPL",
     "company": "Apple Inc.",
     "stock_tstamp": "2020-01-31T21:00:00+00:00",
     "price": 309.51,
     "volume": 49897096,
     "sector": "TECH"
},
{
     "symbol": "GOOG",
     "company": "Alphabet Inc.",
     "stock_tstamp": "2020-01-31T21:00:00+00:00",
     "price": 1434.23,
     "volume": 2417214,
     "sector": "TECH"
},
{
     "symbol": "AMZN",
     "company": "Amazon.com, Inc.",
     "stock_tstamp": "2020-01-31T21:00:00+00:00",
     "price": 2008.72,
     "volume": 15567283,
     "sector": "TECH"
}]
Example SQL to create an external table to reference that JSON file stored on an Azure ADL:
CREATE EXTERNAL TABLE techstocks (
     symbol VARCHAR(4),
     company VARCHAR(20),
     stock_tstamp TIMESTAMP,
     price FLOAT,
     volume INTEGER,
     sector CHAR(5)
) USING SPARK WITH REFERENCE='abfs://stockquotes@eastusstockdata.dfs.core.windows.net/tech1.json',
     FORMAT='json',
     OPTIONS=(
     'multiline'='true',
     'SCHEMA'= 'symbol string, company string, stock_tstamp string, price double, volume integer, sector string');
For more examples, see the User Guide.
Last modified date: 11/09/2022