External Table Usage Notes
Note the following when using external tables:
• For writing to external tables, the mode is SaveMode.Append. Some data sources do not support this mode.
• CREATE EXTERNAL TABLE does not validate the supplied values for REFERENCE, FORMAT, or OPTIONS until the external table is used. So, although confusing at first, the following use case will result in an error if the target does not yet exist:
CREATE EXTERNAL TABLE test_table(col INT NOT NULL) USING SPARK
WITH REFERENCE='/opt/user_mout/test_table.json';
SELECT * FROM test_table; \g
Executing . . .
E_VW1213 External table provider reported an error 'java.io.IOException:
No input paths specified in job'.
However, as soon as the Vector user inserts some data, the external table is created at its original location and a subsequent SELECT statement will succeed:
INSERT INTO test_table VALUES (1);
SELECT * FROM test_table; \g
Executing . . .
(1 row)
┌─────────────┐
│col │
├─────────────┤
│ 1 │
└─────────────┘
(1 row)
IMPORTANT! It is not possible to insert into an external table, referencing a pre-existing file. This is because Spark creates or assumes a folder where it can also store some metadata along with the actual data. This implies, if your external table references a non existing path and you insert into this table, a folder with the given path will be created containing the inserted data and some additional metadata. If you need to add data to an existing file, a workaround is to put this file e.g. a csv file into a folder and use this folder path as external table reference string.
Last modified date: 12/19/2024