External Table Usage Notes
Note the following when using external tables:
• For writing to external tables, the mode is SaveMode.Append. Some data sources do not support this mode. For example, you cannot write to existing CSV files because spark-csv does not support this mode.
• CREATE EXTERNAL TABLE does not validate the supplied values for REFERENCE, FORMAT, or OPTIONS until the external table is actually used. So, although confusing at first, the following use case will result in an error if the target does not yet exist:
CREATE EXTERNAL TABLE test_table(col INT NOT NULL) USING SPARK
WITH REFERENCE='hdfs://cluster06:8020/user/mark/test_table.json';
SELECT * FROM test_table; \g
Executing . . .
E_VW1213 External table provider reported an error 'java.io.IOException:
No input paths specified in job'.
However, as soon as the VectorH user inserts some data, the external table is created at its original location (for example, files are written to HDFS or a new table is created in Hive) and a subsequent SELECT statement will succeed:
INSERT INTO test_table VALUES (1);
SELECT * FROM test_table; \g
Executing . . .
(1 row)
┌─────────────┐
│col │
├─────────────┤
│ 1│
└─────────────┘
(1 row)