PARQUET Files
userdata.parquet
File containing data in PARQUET format. File has header.
Column details:
column# column_name hive_datatype
===============================================
1 registration_dttm timestamp
2 id int
3 first_name string
4 last_name string
5 email string
6 gender string
7 ip_address string
8 cc string
9 country string
10 birthdate string
11 salary double
12 title string
1. Create external table:
CREATE EXTERNAL TABLE parquet_ex_test (
registration_dttm TIMESTAMP,
id INTEGER,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(50),
gender VARCHAR(50),
ip_address VARCHAR(50),
cc VARCHAR(50),
country VARCHAR(50),
birthdate VARCHAR(50),
salary DECIMAL(18,2),
title VARCHAR(50)
) USING SPARK WITH
REFERENCE='/tmp/userdata.parquet',
FORMAT = 'parquet';
2. Create native Vector table:
CREATE TABLE parquet_test(
registration_dttm TIMESTAMP,
id INTEGER,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(50),
gender VARCHAR(50),
ip_address VARCHAR(50),
cc VARCHAR(50),
country VARCHAR(50),
birthdate VARCHAR(50),
salary DECIMAL(18,2),
title VARCHAR(50)
) WITH STRUCTURE=X100;
3. Load Vector table with INSERT command
INSERT INTO parquet_test SELECT * FROM parquet_ex_test
Note: If the file does not have a header, include schema in OPTIONS, as shown in the external table to load an ORC file.
Last modified date: 12/06/2024