User Guide : Avalanche Data Loading Templates : Loading Delimited Files from Amazon S3
 
Share this page                  
Loading Delimited Files from Amazon S3
The Delimited_S3_to_Avalanche_AWS template is used to load delimited files stored in Amazon S3 buckets to a table in an Actian Avalanche data warehouse hosted on the AWS Cloud.
Using the macros, provide the connection string to your Avalanche data warehouse, credentials for the database and S3 bucket, and whether or not the first row of data contains a header for the column names. The connector used in the templates handles tab, comma, and pipe delimiters by default. If the source file has a different field separator, you can specify the delimiter using a macro.
When you run a configuration with this template, it creates a new table if it does not exist or inserts data to an existing table of the same name if it exists.
Before using the Template to load files from Amazon S3 to Avalanche, you must perform the following:
Confirm that the Avalanche data warehouse is running on AWS
Obtain credentials for connecting to the Avalanche database (user name and password)
If Integration Manager on DataCloud is used:
Obtain the ODBC Connection String for your Avalanche data warehouse. You can obtain this information from Avalanche portal.
If Integration Manager On-premise is used, do one of the following:
Create ODBC DSN (as a System DSN) with the ODBC driver shipped with latest client runtime, which can be downloaded from https://esd.actian.com/ (select Product as Actian Avalanche, Release as Client Runtime, and Platform as Windows/Linux)
Setup ODBC DSN (as a System DSN) pointing to the Avalanche AWS instance
In the connect string, replace the name of the driver based on the operating system:
For Windows, Driver=Actian CR
For Linux, configure a driver entry called Ingres in the odbcinst.ini file (if it is not already present).
Choose a file in an S3 bucket in the same region as the Avalanche data warehouse
Note:  The Avalanche instance and S3 buckets must be in the same region. If they are in different regions, the log shows the "Invalid Credentials" error message after process execution.
Obtain your AWS Access Key and AWS Secret Key for accessing AWS services.
To load delimited data files from Amazon S3 to an Avalanche data warehouse:
1. Select the Delimited_S3_to_Avalanche_AWS configuration.
2. Set the values for the following required macros.
Note:  You must provide values for these macros so that the Template has the necessary information to access your data files, Avalanche warehouse, and find or create your table.
You can encrypt sensitive macro values (such as passwords, secret keys, and tokens for token based authentication) from Integration Manager. For information about encrypting, see Enable Macro Encryption section in Integration Manager documentation available at docs.actian.com.
 
Macro Name
Description
$(AVALANCHE_CONNECT_STRING)
ODBC Connection string for connecting to Avalanche database. You can obtain this information from the Avalanche portal.
Note:  This is required only if you are running Integration Manager on DataCloud.
$(AVALANCHE_USERNAME)
$(AVALANCHE_PASSWORD)
Credentials used to connect to the database.
$(AVALANCHE_TABLE)
Name of the table in Avalanche where the data will be written to.
$(AWS_ACCESS_KEY)
$(AWS_SECRET_KEY)
Credentials used to access AWS services.
$(AWS_BUCKET_NAME)
Name of the AWS storage bucket to pull the data from.
$(AWS_REGION)
Region ID of the location that hosts the specific S3 bucket.
The supported regions are:
US East (N. Virginia)
US East (Ohio)
US West (Oregon)
Europe (Ireland)
Europe (London)
Europe (Frankfurt)
You can specify the region in the above format or in the following format:
us-east-1
us-east-2
us-west-2
eu-west-1
eu-west-2
eu-central-1
Note:  Region information is case and format sensitive. If you do not enter the region value correctly, then the template will fail during execution. Avalanche cluster and the bucket must be in the same region.
$(HEADER)
Specifies whether first data row contains column or field names. Used to detect field names for creating a new database table.
$(S3_FILE_LIST)
Comma separated list of input files present in the S3 bucket. Files must have the same schema.
3. If you want to override any of the default settings, you can add the following macros and set the values. Usually, these are not required unless you have a data file with a typical record or field separators, want to increase the sample size, or want to specify the table operation. These macros are not displayed by default. You can manually add the name and set the value to override the default value.
 
Macro Name
Description
$(AVALANCHE_DSN)
Name of the ODBC data source to use for connecting to Avalanche database. Specify this macro if you want to use a pre-configured DSN on your system instead of the connect string.
$(AVALANCHE_CREATE_TABLE_QUERY)
Create table statement to use for creating the table. Make sure partitioning is specified. Table name in the query must match the AVALANCHE_TABLE macro value.
$(AVALANCHE_CREATE_TABLE_OPTIONS)
Use this option when you do not want to build the complete query but only want to specify options to pass to "with" clause of create table query.
Note:  Make sure partitioning is specified. Also, this macro is ignored if AVALANCHE_CREATE_TABLE_QUERY macro is defined.
$(FIELD_SEPARATOR)
Delimiter used in the source files to separate the fields.
$(RECORD_SEPARATOR)
Delimiter used in the source files to separate the data records.
$(QUOTE_CHARACTERS)
Character used to quote fields. For example, double quote ".
Use two characters if start and end quote characters are different. For example, [].
$(SAMPLE_SIZE)
Number of records that must be sampled for building the source schema.
$(OUTPUT_MODE)
Table operations that must be performed before inserting data. The available operations are:
replace: Drops existing table and creates new table
delete_append: Truncates table before inserting.
append: Creates table only if it does not exist and inserts records.
Default is append.
$(DEFAULT_TEXT_COL_SIZE)
Set the default size of the text columns in the table. Set it to a reasonable value based on your data to avoid truncations.
This property is also useful for inserting double-byte characters like Japanese or Chinese. Varchar is used for text data types that supports single byte characters. To support double-byte characters in varchar data type, the size of the column must be doubled using this macro.
$(UNICODE_CHARS)
Indicates whether the data contains Unicode characters. If set to True, nvarchar data type is used for text columns.
$(VW_XXX)
Specify VWLOAD options as a macro in the VW_XXX format. XXX can be any of the properties listed in the COPY VWLOAD section in the Avalanche documentation. One macro can be added for each property. For properties such as STRICTNULLS that does not accept a value, the macro value must be the name of the property itself.
Note:  FDELIM, RDELIM, QUOTE, AWS credentials must be specified using specific macros for these properties.
For example:
VW_NULLS = NULL
VW_STRICTNULLS = STRICTNULLS
$(SOURCE_FETCH_SIZE)
Size of the data (in bytes) to fetch from the source file in the s3 bucket. Default is 15000000 (15MB).
Note:  
You can specify individual column sizes by defining macros in the format COL_SIZE_XXX, where XXX is the source field name in uppercase. This is only applicable for fields of text type.

If you specify one of the first three macros (FIELD_SEPARATOR, RECORD SEPARATOR, or QUOTE CHARACTERS), then you must also specify the other two macros. Else, other delimiters are not auto-identified and use default values. For example, if you specify only QUOTE CHARACTER, then FIELD_SEPARATOR defaults to "," and RECORD_SEPARATOR defaults to "\r\n"(Windows) and "\n"(Linux).
4. Run the configuration.
If there are any errors during job execution, see Troubleshooting Template Configuration Execution Errors for the troubleshooting information.