Vector 6.0 | AVRO Files

User Guide > User Guide > Using External Tables > Examples of Defining External Tables and Loading Data > AVRO Files

Was this helpful?

AVRO Files

userdata.avro

Avro file has a header.

Column Details:

column# column_name hive_datatype

===============================================================

1 registration_dttm timestamp

2 id int

3 first_name string

4 last_name string

5 email string

6 gender string

7 ip_address string

8 cc string

9 country string

10 birthdate string

11 salary double

12 title string

1. Create External table:

CREATE EXTERNAL TABLE avro_ex_test (

registration_dttm TIMESTAMP,

id INTEGER,

first_name VARCHAR(50),

last_name VARCHAR(50),

email VARCHAR(50),

gender VARCHAR(50),

ip_address VARCHAR(50),

cc VARCHAR(50),

country VARCHAR(50),

birthdate VARCHAR(50),

salary DECIMAL(18,2),

title VARCHAR(50)

) USING SPARK WITH

REFERENCE='/tmp/userdata.avro',

FORMAT='com.databricks.spark.avro'

OPTIONS=('header' = 'true');

2. Create native Vector table

CREATE TABLE avro_test(

registration_dttm TIMESTAMP,

id INTEGER,

first_name VARCHAR(50),

last_name VARCHAR(50),

email VARCHAR(50),

gender VARCHAR(50),

ip_address VARCHAR(50),

cc VARCHAR(50),

country VARCHAR(50),

birthdate VARCHAR(50),

salary DECIMAL(18,2),

title VARCHAR(50)

) WITH STRUCTURE=X100;

3. Load Vector table with INSERT command:

INSERT INTO avro_test SELECT * FROM avro_ex_test

Note: If the file does not have a header, include schema in OPTIONS, as shown in the external table to load an ORC file.

Note: The following entry must exist in $II_SYSTEM/ingres/files/spark-provider/spark_provider.conf. If necessary, add the entry and restart the Spark-Vector Provider.

spark.jars.packages=com.databricks:spark-avro_2.11:3.1.0

Last modified date: 11/09/2022