Was this helpful?
Load Data with Spark Loader
The Spark Loader is a command-line client utility that allows you to load files such as CSV, JSON, Parquet, or ORC through Spark into Actian Ingres using the Spark Connector. The Spark Loader is contained within the Spark Connector service container image but can also be accessed as a standalone tool.
For a functional setup, you need a working Spark installation and the Loader JAR file. Contact Actian support for a copy of the Loader JAR file and the installation requirements for Spark.
Using the Loader Included in the Service Container
To load files from a local storage, you must mount a folder from the local file system into the container, as described in Section [Setting up service containers]. The content of the mounted folder appears under /opt/user_mount/ in the container file system. The Loader JAR is located under /opt/spark/loader/spark_vector_loader.jar in the container.
For example:
Assume that the running container is named spark, the Actian Ingres host is localhost, the installation id is II, the database is named testdb, and user and password are actian.
To view all the command line options for loading file contents in different formats into the product, run the following command:
docker exec spark /bin/bash -c '/opt/spark/bin/spark-submit --class com.actian.spark_vector.loader.Main /opt/spark/loader/spark_vector_loader.jar --help
Examples of Loading Data with Spark Loader
The following are the sample data files and the commands used to load their contents into Actian Ingres using the Spark Loader.
CSV File with Header
employee_header_test.csv
"name"|"salary"
"abc"|100
"xyz"|150
Command
docker exec spark /bin/bash -c '/opt/spark/bin/spark-submit --class com.actian.spark_vector.loader.Main /opt/spark/loader/spark_vector_loader.jar load csv -sf /opt/user_mount/employee_header_test.csv -vh localhost -vi VW -vd testdb -vu actian -vp actian -ct true -tt svc_csv_with_header -sc "|" -sh true
Note:  This command will create a table if –ct true is used
CSV File without Header
employee_test.csv
"abc"|100
"xyz"|150
Command
docker exec spark /bin/bash -c '/opt/spark/bin/spark-submit --class com.actian.spark_vector.loader.Main /opt/spark/loader/spark_vector_loader.jar load csv -sf /opt/user_mount/employee_test.csv -h "name string,salary int" -vh localhost -vi VW -vd testdb -vu actian -vp actian -ct true -tt svc_csv_without_header -sc "|"
Note:  This command will create a table if –ct true is used.
JSON File
employee_test.json
{"name":"Michael", "salary":3000}
{"name":"Andy", "salary":4000}
{"name":"Justin", "salary":5900}
{"name":"Berta", "salary":4800}
{"name":"Raju", "salary":3900}
{"name":"Chandy", "salary":5500}
{"name":"Joey", "salary":3500}
{"name":"Mon", "salary":4000}
{"name":"Rachel", "salary":4000}
Command
docker exec spark /bin/bash -c '/opt/spark/bin/spark-submit --class com.actian.spark_vector.loader.Main /opt/spark/loader/spark_vector_loader.jar load json -sf /opt/user_mount/employee_test.json -vh localhost -vi VW -vd testdb -vu actian -vp actian -ct true -tt svc_json'help
Parquet File
Userdata.parquet
column# column_name hive_datatype
===============================================
1         registration_dttm        timestamp
2         id                      int
3         first_name              string
4         last_name               string
5         email                   string
6         gender                  string
7         ip_address              string
8         cc                      string
9         country                 string
10        birthdate               string
11        salary                  double
12        title                   string
13        comments                string
Command
docker exec spark /bin/bash -c '/opt/spark/bin/spark-submit --class com.actian.spark_vector.loader.Main /opt/spark/loader/spark_vector_loader.jar load json -sf /opt/user_mount/userdata.parquet -vh localhost -vi VW -vd testdb -vu actian -vp actian -ct true -tt svc_parquet -cols "registration_dttm,id,first_name,last_name,email,gender,ip_address,cc,country,birthdate,salary,title"'
This command creates a table if –ct is set to true. The option -cols is used to specify the columns to load into an Actian Ingres table. The command mentions all columns except comments, so it will create a table svc_parquet with all columns except the comments column.
For sample data reference, see https://github.com/Teradata/kylo/tree/master/samples/sample-data/parquet.
ORC File
UserData.orc
ORC file does not have a header.
column#   column_name           hive_datatype
==============================================
1         registration_dttm     timestamp
2         id                    int
3         first_name            string
4         last_name             string
5         email                 string
6         gender                string
7         ip_address            string
8         cc                    string
9         country               string
10        birthdate             string
11        salary                double
12        title                 string
13        comments              string
Command
docker exec spark /bin/bash -c '/opt/spark/bin/spark-submit --class com.actian.spark_vector.loader.Main /opt/spark/loader/spark_vector_loader.jar load orc -sf /opt/user_mount/userdata.orc -vh localhost -vi VW -vd testdb -vu actian -vp actian -ct true -tt svc_orc -cols "_col0,_col1,_col2,_col3,_col4,_col5,_col6,_col7,_col8,_col9,_col10,_col11"'
Note:  Since this file does not have a header, the table is created with column name _col0, _col1 and so on. _col0 to _col11, is specified so the command will create table svc_orc with all columns except the comments column.
Another way to load this file is to create the svc_orc table using the CREATE TABLE command, and then execute the spark-submit command without the –ct flag:
CREATE TABLE svc_orc(
    registration_dttm    TIMESTAMP,
    id                   INTEGER,
    first_name           VARCHAR(50),
    last_name            VARCHAR(50),
    email                VARCHAR(50),
    gender               VARCHAR(50),
    ip_address           VARCHAR(50),
    cc                   VARCHAR(50),
    country              VARCHAR(50),
    birthdate            VARCHAR(50),
    salary               FLOAT8,
    title                VARCHAR(50)
) WITH STRUCTURE=X100;
docker exec spark /bin/bash -c '/opt/spark/bin/spark-submit --class com.actian.spark_vector.loader.Main /opt/spark/loader/spark_vector_loader.jar load orc -sf /opt/user_mount/userdata.orc -vh localhost -vi VW -vd testdb -vu actian -vp actian -tt svc_orc -cols "_col0,_col1,_col2,_col3,_col4,_col5,_col6,_col7,_col8,_col9,_col10,_col11"'
The column names are automatically matched. However, the ordering of the columns needs to be matched, and the columns can be skipped only at the end, not in between.
For sample data reference, see https://github.com/Teradata/kylo/tree/master/samples/sample-data/orc.
Last modified date: 01/27/2026