Loading Google Cloud Storage Data to Avalanche on Google Cloud

Macro Name	Description
$(AVALANCHE_CONNECT_STRING)	ODBC Connection string for connecting to Avalanche database. You can obtain this information from the Avalanche portal. Note: This is required only if you are running Integration Manager on DataCloud.
$(AVALANCHE_USERNAME) $(AVALANCHE_PASSWORD)	Credentials used to connect to the database.
$(AVALANCHE_TABLE)	Name of the table in Avalanche where the data will be written to.
$(FILE_LIST)	Comma separated list of input files present in the GCP bucket. Files must have the same schema.
$(HEADER)	Specifies whether first data row contains column or field names. Used to detect field names for creating a new database table.

Macro Name	Description
$(GCP_BUCKET)	Name of the bucket in Google Cloud Platform storage to use for staging the data.
$(GCP_CLIENT_EMAIL)	Service account client email from GCP console.
$(GCP_PRIVATE_KEY)	Private key object for the service account in PKCS#8 format.
$(GCP_CLIENT_ID)	Service account client ID from GCP console.
$(GCP_PRIVATE_KEY_ID)	Private key identifier for the service account from GCP console.

Macro Name	Description
$(AVALANCHE_DSN)	Name of the ODBC data source to use for connecting to Avalanche database. Specify this macro if you want to use a pre-configured DSN on your system instead of the connect string.
$(AVALANCHE_CREATE_TABLE_QUERY)	Create table statement to use for creating the table. Make sure partitioning is specified. Table name in the query must match the AVALANCHE_TABLE macro value.
$(AVALANCHE_CREATE_TABLE_OPTIONS)	Use this option when you do not want to build the complete query but only want to specify options to pass to "with" clause of create table query. Note: Make sure partitioning is specified. Also, this macro is ignored if AVALANCHE_CREATE_TABLE_QUERY macro is defined.
$(FIELD_SEPARATOR)	Delimiter used in the source files to separate the fields.
$(RECORD_SEPARATOR)	Delimiter used in the source files to separate the data records.
$(QUOTE_CHARACTERS)	Character used to quote fields. For example, double quote ". Use two characters if start and end quote characters are different. For example, [].
$(SAMPLE_SIZE)	Number of records that must be sampled for building the source schema.
$(OUTPUT_MODE)	Table operations that must be performed before inserting data. The available operations are: • replace: Drops existing table and creates new table • delete_append: Truncates table before inserting. • append: Creates table only if it does not exist and inserts records. Default is append.
$(DEFAULT_TEXT_COL_SIZE)	Set the default size of the text columns in the table. Set it to a reasonable value based on your data to avoid truncations. This property is also useful for inserting double-byte characters like Japanese or Chinese. Varchar is used for text data types that supports single byte characters. To support double-byte characters in varchar data type, the size of the column must be doubled using this macro.
$(UNICODE_CHARS)	Indicates whether the data contains Unicode characters. If set to True, nvarchar data type is used for text columns.
$(SOURCE_FETCH_SIZE)	Size of the data (in bytes) to fetch from the source file in the s3 bucket. Default is 15000000 (15MB).
$(AVALANCHE_DBADMIN_GROUP_ACCESS)	Grant table access to "dbadmingrp" group. Only applicable when new table is created. Default is True.
$(GCP_REGION)	Cloud storage location for specified bucket.
$(GCP_PROJECT_ID)	ID number for the project from GCP console.
$(TRUNCATION_HANDLING)	Specifies truncation handling for text data. The supported values are: • ignore - Ignores the truncation and continues the execution. This is the default value. • error - Logs an error message and aborts the execution.

Note: You can specify individual column sizes by defining macros in the format COL_SIZE_XXX, where XXX is the source field name in uppercase. This is only applicable for fields of text type.

If you specify one of the first three macros (FIELD_SEPARATOR, RECORD SEPARATOR, or QUOTE CHARACTERS), then you must also specify the other two macros. Else, other delimiters are not auto-identified and use default values. For example, if you specify only QUOTE CHARACTER, then FIELD_SEPARATOR defaults to "," and RECORD_SEPARATOR defaults to "\r\n"(Windows) and "\n"(Linux).

Specify VWLOAD options as a macro in the VW_XXX format. XXX can be any of the properties listed in the COPY VWLOAD section in the Avalanche documentation. One macro can be added for each property. For properties such as STRICTNULLS that does not accept a value, the macro value must be the name of the property itself.

FDELIM, RDELIM, QUOTE, and GCP credentials must be specified using specific macros for these properties.