Data Loading Guide > Appendixes > Exporting and Importing Data
Was this helpful?
Exporting and Importing Data
SQL Loading and Exporting Methods
 
Loading
Exporting
Notes
COPY avalanche_table VWLOAD FROM external_files
INSERT INTO EXTERNAL CSV external_files SELECT * FROM avalanche_table
This method uses file-based parallelization that is part of the engine.
CREATE EXTERNAL TABLE external_table USING SPARK WITH REFERENCE= external_files
INSERT INTO avalanche_table SELECT * FROM external_table
CREATE EXTERNAL TABLE external_table USING SPARK WITH REFERENCE= external_files
INSERT INTO external_table SELECT * FROM avalanche_table
This method uses stream-based parallelization from an outside Spark process.
Exporting to an External Table and Importing the Data Elsewhere
To export the contents of a table, you may use external tables. If you need to import the same data into another database, you can use the same external table definition that wrote the data to import it.
Here is an example with the sample.l_airport table, available with every Actian warehouse:
DROP TABLE IF EXISTS airport_ext;
CREATE EXTERNAL TABLE airport_ext (code char(3), description varchar(100)) USING spark WITH
reference = 'gs://my-bucket/airport_csv',
format = 'csv',
options =(
    'header' = 'true',
    'quote' = '"',
    'nullValue' = 'null',
    'sep' = ','
);
 
INSERT INTO airport_ext SELECT * FROM sample.l_airport;
DROP TABLE IF EXISTS airport;
CREATE TABLE airport (code char(3), description varchar(100));
 
INSERT INTO airport SELECT * FROM airport_ext;
To keep the example simple, the target table named airport is created in the same database. But the example works with two databases if you create the external table in the target database as well.
In a single database, external tables are not needed to copy a table. To do this, the following command is more straightforward:
INSERT INTO airport SELECT * FROM sample.l_airport;
To keep the INSERT statement simple, the field list is omitted, and the SELECT can use a '*' because both source and target table definitions are the same.
Listing the fields explicitly provides more control over what is copied:
INSERT INTO airport_ext(code, description)
SELECT code, description FROM sample.l_airport;
Using DBeaver to Generate a CREATE TABLE Statement
To ensure that the definitions are created correctly for source and target tables with more fields, tools are available that generate the correct CREATE TABLE statement, as well as the complete field list for use with the INSERT statement. For example, the freely available SQL client tool DBeaver can run such CREATE and SELECT statements for any table definition.
Connect to an Actian database using DBeaver, then select the table in Database Navigator, right-click to bring up the context menu, and select “Generate SQL - DDL.” This displays the table’s definition. The following example definition, generated by DBeaver, is for the fact table sample.ontime, which has many fields:
CREATE TABLE sample.ontime (
"year" INTEGER NOT NULL,
quarter INTEGER NOT NULL,
"month" INTEGER NOT NULL,
dayofmonth INTEGER NOT NULL,
dayofweek INTEGER NOT NULL,
flightdate ANSIDATE NOT NULL,
reporting_airline CHAR(3) NOT NULL,
dot_id_reporting_airline INTEGER NOT NULL,
iata_code_reporting_airline CHAR(3),
tail_number VARCHAR(5),
flight_number_reporting_airline VARCHAR(10) NOT NULL,
originairportid INTEGER,
originairportseqid INTEGER,
origincitymarketid INTEGER,
origin CHAR(5),
origincityname VARCHAR(35) NOT NULL,
originstate CHAR(2),
originstatefips VARCHAR(10),
...
};
When using this generated table definition to define the external table, it makes sense to remove the 'NOT NULL' constraints from all fields. This makes the external table definition easier, because you do not need to provide the schema option for Spark. That option lets you explicitly specify the table column definitions of the source using Spark SQL types (see Actian Data Platform to Spark Data Type Mapping).
For export/import, there is no risk that the missing 'NOT NULL' constraints are violated.
The CREATE EXTERNAL TABLE statement explains this simplifying definition change: “If external table columns are defined as NOT NULL, the table schema must be specified with NOT NULL in the SCHEMA option, even if header is set to true. This is because if Spark SQL implicitly determines the schema information from, for example, a CSV file, the nullability of columns typically defaults to true.” See CREATE EXTERNAL TABLE Examples.
Using COPY VWLOAD
If you are concerned about load time, use COPY VWLOAD instead of external tables. Loading with COPY VWLOAD is about 30% faster than with external tables.
Required COPY VWLOAD Options to Load Data Written with INSERT INTO EXTERNAL CSV
COPY VWLOAD options are named differently and have defaults different from those of INSERT INTO EXTERNAL CSV.
INSERT INTO EXTERNAL CSV may be used without any options, but COPY VWLOAD requires these options:
FDELIM = ',',
NULLVALUE = 'null',
QUOTE = '"'
 
INSERT INTO EXTERNAL CSV
COPY VWLOAD
FIELD_SEPARATOR='field_separator'
Default: , (comma)
FDELIM='field_delimiter'
Default: "|"
NULL_MARKER='null_value'
Default: 'null'
NULLVALUE='null_value'
Default: ' '
INSERT uses double quotes. This behavior cannot be changed or switched off. Therefore, COPY VWLOAD must use QUOTE='"'
QUOTE='quote_character'
Default: none
RECORD_SEPARATOR='record_separator'
Default: \n
RDELIM='record_delimiter'
Default: "\n"
Example INSERT INTO EXTERNAL CSV:
The Storage Admin role is required to run this statement.
INSERT INTO EXTERNAL CSV 'gs://mybucket/ontime/ontime.csv'
SELECT * FROM sample.ontime WITH
    GCS_EMAIL = 'my-service-account@my-project.iam.gserviceaccount.com',
    GCS_PRIVATE_KEY_ID = '08154d119815a1e7bd37dfcfe7e628f041034a38',
    GCS_PRIVATE_KEY = '----BEGIN PRIVATE KEY---\nMI...O\n---END PRIVATE KEY----\n',
Example COPY VWLOAD:
COPY ontime() VWLOAD FROM 'gs://mybucket/ontime/ontime.csv.*' WITH
    GCS_EMAIL = 'my-service-account@my-project.iam.gserviceaccount.com',
    GCS_PRIVATE_KEY_ID = '08154d119815a1e7bd37dfcfe7e628f041034a38',
    GCS_PRIVATE_KEY = '----BEGIN PRIVATE KEY---\MI...O\n---END PRIVATE KEY----\n',
    FDELIM = ',',
    NULLVALUE = 'null',
    QUOTE = '"'
Exporting to CSV Files
For more information about exporting data into comma-separated values files, see INSERT INTO EXTERNAL CSV.
 
Last modified date: 10/30/2024