AVRO Files
userdata.avro
Avro file has a header.
Column Details:
column# column_name hive_datatype
===============================================================
1 registration_dttm timestamp
2 id int
3 first_name string
4 last_name string
5 email string
6 gender string
7 ip_address string
8 cc string
9 country string
10 birthdate string
11 salary double
12 title string
1. Create External table:
CREATE EXTERNAL TABLE avro_ex_test (
registration_dttm TIMESTAMP,
id INTEGER,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(50),
gender VARCHAR(50),
ip_address VARCHAR(50),
cc VARCHAR(50),
country VARCHAR(50),
birthdate VARCHAR(50),
salary DECIMAL(18,2),
title VARCHAR(50)
) USING SPARK WITH
REFERENCE='/tmp/userdata.avro',
FORMAT='com.databricks.spark.avro'
OPTIONS=('header' = 'true');
2. Create native Vector table
CREATE TABLE avro_test(
registration_dttm TIMESTAMP,
id INTEGER,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(50),
gender VARCHAR(50),
ip_address VARCHAR(50),
cc VARCHAR(50),
country VARCHAR(50),
birthdate VARCHAR(50),
salary DECIMAL(18,2),
title VARCHAR(50)
) WITH STRUCTURE=X100;
3. Load Vector table with INSERT command:
INSERT INTO avro_test SELECT * FROM avro_ex_test
Note: If the file does not have a header, include schema in OPTIONS, as shown in the external table to load an ORC file.
Note: The following entry must exist in $II_SYSTEM/ingres/files/spark-provider/spark_provider.conf. If necessary, add the entry and restart the Spark-Vector Provider.
spark.jars.packages=com.databricks:spark-avro_2.11:3.1.0
Last modified date: 06/28/2024