Was this helpful?
AVRO Files
userdata.avro
Avro file has a header.
Column Details:
column#    column_name          hive_datatype
===============================================================
1          registration_dttm    timestamp
2          id                   int
3          first_name           string
4          last_name            string
5          email                string
6          gender               string
7          ip_address           string
8          cc                   string
9          country              string
10         birthdate            string
11         salary               double
12         title                string
1. Create External table:
CREATE EXTERNAL TABLE avro_ex_test (
     registration_dttm     TIMESTAMP,
     id                    INTEGER,
     first_name            VARCHAR(50),
     last_name             VARCHAR(50),
     email                 VARCHAR(50),
     gender                VARCHAR(50),
     ip_address            VARCHAR(50),
     cc                    VARCHAR(50),
     country               VARCHAR(50),
     birthdate             VARCHAR(50),
     salary                DECIMAL(18,2),
     title                 VARCHAR(50)
) USING SPARK WITH
REFERENCE='/tmp/userdata.avro',
FORMAT='com.databricks.spark.avro'
OPTIONS=('header' = 'true');
2. Create native Vector table
CREATE TABLE avro_test(
     registration_dttm     TIMESTAMP,
     id                    INTEGER,
     first_name            VARCHAR(50),
     last_name             VARCHAR(50),
     email                 VARCHAR(50),
     gender                VARCHAR(50),
     ip_address            VARCHAR(50),
     cc                    VARCHAR(50),
     country               VARCHAR(50),
     birthdate             VARCHAR(50),
     salary                DECIMAL(18,2),
     title                 VARCHAR(50)
) WITH STRUCTURE=X100;
3. Load Vector table with INSERT command:
INSERT INTO avro_test SELECT * FROM avro_ex_test
Note:  If the file does not have a header, include schema in OPTIONS, as shown in the external table to load an ORC file.
Note:  The following entry must exist in $II_SYSTEM/ingres/files/spark-provider/spark_provider.conf. If necessary, add the entry and restart the Spark-Vector Provider.
spark.jars.packages=com.databricks:spark-avro_2.11:3.1.0
 
Last modified date: 03/21/2024