Technical Documentation : Connectors : Adding an AWS Glue (Data Catalog) Connection
Was this helpful?
Adding an AWS Glue (Data Catalog) Connection
Prerequisites
  • A user with sufficient permissions is required to establish a connection with AWS Glue.
  • Zeenea traffic flows towards the database must be open.
The Agent's host server must have sufficient credentials to connect to AWS Glue; in this case, available authentication methods are:
  • Instance Role
  • Environment Variable
  • Configuration File
Note: You can find a link to the configuration template in Zeenea Connector Downloads.
Target
Protocol
Usual Ports
AWS Glue
HTTP
443
Supported Versions
The AWS Glue connector was successfully tested with the online application.
Installing the Plugin
The AWS Glue plugin can be downloaded here: Zeenea Connector Downloads.
For more information on how to install a plugin, please refer to the following article: Installing and Configuring Connectors as a Plugin.
Declaring the Connection
Creating and configuring connectors is done through a dedicated configuration file located in the /connections folder of the relevant scanner.
In order to establish a connection with an AWS Glue instance, specifying the following parameters in the dedicated file is required:
Parameter
Expected value
name
The name that will be displayed to catalog users for this connection.
code
The unique identifier of the connection on the Zeenea platform. Once registered on the platform, this code must not be modified or the connection will be considered as new and the old one removed from the scanner.
connector_id
The type of connector to be used for the connection. Here, the value must be aws.glue and this value must not be modified.
connection.aws.access_key_id
AWS Glue Access Key Identifier
connection.aws.secret_access_key
AWS Glue Secret Access Key
connection.aws.region
AWS region
connection.aws.profile
AWS Profile for authentication
connection.aws.multi_account.enabled
Allow a single connection to retrieve data from other AWS account data catalog.

In order to connect to multiple accounts, you need to configure AWS cross access between accounts.

AWS documentation : https://docs.aws.amazon.com/glue/latest/dg/cross-account-access.html.

Default value is false.

Since version 3.3.1
connection.aws.multi_account.list
Define which account/region to connected to. It must be a list of account:region entries, separated by a space.


Example : 123456789012:eu-west-2 987654321098:eu-west-2

Since version 3.3.1
connection.fetch_page_size
(Advanced) define the size of batch of items loaded by each request in inventory.

Since version 1.0.3
filter
Lets you filter based on specific characteristics. See Rich Filters below for a comprehensive explanation.

Since version 3.4.1
User Permissions
In order to collect metadata, the running user's permissions must allow them to access and read databases that need cataloging.
Roles
The user must be able to run the following actions on the target bucket and the objects it contains:
  • glue:GetTable
  • glue:GetTables
  • glue:GetDatabases
Example for cataloging a bucket (in JSON):
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "SearchTables",
"Effect": "Allow",
"Action": [
"glue:GetTable",
"glue:GetTables",
"glue:GetDatabases"
],
"Resource": "*"
}
]
}
Rich Filters
Databases and Tables
Starting with version 3.4.1, the connector embeds a rich filter mechanism that enables you to extract only specific tables or databases matching the criteria.
Criteria
Description
database
Database name
table
Table name
Example:
filter = "database in ('production', 'qa') and table ~ /(?:hr|it|market)_figures/"
Note: The filter attribute can contain either the raw value or a file URL to the content. (e.g., file:///path/to/zeenea/connections/aws-glue-inventory-filter.json)
When using an side-file, filter changes are taken into account without restarting the scanner.
Data Extraction
To extract information, the connector requests AWS Glue to get tables and metadata.
Collected Metadata
Inventory
Will collect the list of tables and views accessible by the user.
Dataset
A dataset can be a table or a view.
  • Name
  • Source Description
  • Technical Data:
    • AWS Region
    • Database Name
    • Location
    • Owner
    • CreateTime
    • UpdateTime
    • LastAccessTime
    • LastAnalyzeTime
    • TableType
Field
Dataset field.
  • Name
  • Source Description
  • Type
  • Can be null: Depending on field settings
  • Multivalued: Depending on field settings
  • Primary Key: Not supported. Default value false.
Object Identification Keys
An identification key is associated with each object in the catalog. In the case of the object being created by a connector, the connector builds it.
Object
Identification Key
Description
Dataset
code/aws region/dataset identifier
- code: Unique identifier of the connection noted in the configuration file
- aws region: AWS region code
- dataset identifier: Table name
Database schema name
S3 bucket key
Field
code/aws region/dataset identifier/field name
- code: Unique identifier of the connection noted in the configuration file
- aws region: AWS region code
- dataset identifier:
Database schema name
S3 bucket key
- field name
Last modified date: 12/16/2025