Adding a Google Dataplex (V2) Connection¶

Prerequisites¶

A user with sufficient permissions is required to establish a connection with Dataplex.
Zeenea traffic flows towards the data source must be open.

Note

You can find a link to the configuration template in Zeenea Connector Downloads.

Supported Versions¶

The Dataplex connector was developed and tested with the web version of the product.

Installing the Plugin¶

You can download the Google plugin from Zeenea Connector Downloads.

For more information about how to install a plugin, see Installing and Configuring Connectors as a Plugin.

Declaring the Connection¶

Connectors are created and configured through a dedicated configuration file located in the /connections folder of the relevant scanner.

For more information about managing connections, see Managing Connections.

To establish a connection with a Dataplex instance, fill in the following parameters in the dedicated configuration file:

Parameter	Expected Value
`name`	The name that will be displayed to catalog users for this connection.
`code`	Unique identifier of the connection on the Zeenea platform. Once registered on the platform, this code must not be modified or the connection will be considered as new and the old one removed from the scanner.
`connector_id`	The type of connector to be used for the connection. The value must be `google-dataplex-v2` and must not be modified.
`enabled`	A boolean value to enable or disable the connection (`true` or `false`). The default value is `true`.
`catalog_code`	The catalog code associated with the connection (`default` when empty).
`alias`	The list of aliases used by other connectors to generate lineage link.
`secret_manager.enabled`	Configuration for a secret manager. This configuration works only with Scanner 73 or later and requires a functional secret manager configured in the scanner configuration file.
`secret_manager.key`	The name of the secret.
`connection.json_key`	JSON access key. You can either specify the key directly or store it in a separate file. If stored in a separate file, this parameter indicates the path to the file in the form of a URI of scheme `file:`. For example: `file:///opt/zeenea-scanner/connections/gdc_json_key.json` Warning: If you specify the token directly, you must enclose the key in triple quotes (`"""`) as a parameter. For example: `"""{my:"json"}"""`
`scope.project_id`	List of project ids separated by a comma.
`scope.location_id`	Unique location id. Location id corresponds to GCP region (for example: `"europe-west3,us-west1"`). See https://cloud.google.com/compute/docs/regions-zones#available.
`filters`	Universal filters. See Universal Filters.
`proxy.scheme`	Depending on the proxy, `http` or `https`.
`proxy.hostname`	Proxy address
`proxy.port`	Proxy port
`proxy.username`	Proxy username
`proxy.password`	Proxy account password
`quota.read_per_minute`	Reads per minute quota value. The default value is `6000` (default value of Google Data Catalog).
`quota.search_per_user_per_minute`	Search quota value per user per minute. The default value is `180` (default value in Google Data Catalog).
`quota.timeout_minute`	Maximum waiting time when waiting for the availability of a quota. The default value is `10` minutes.
`quota.max_retry`	Maximum number of retries when a request encounters a quota expiration error.

Universal Filters¶

Use the universal filter language to filter and root items based on the following criteria:

Criteria	Description
entry_group	Dataplex Entry Group (@bigquery)
project	Dataplex Entry project
dataset	Dataplex Entry dataset
table	Dataplex Entry table

Example:

filters = [
  {
    id="accept_zeenea_dataset"
    action = ACCEPT
    rules {
      dataset = "ZEENEA*"
    }
  },
  {
    id = "default_reject"
    action = REJECT
  }
]

For more information about universal filters, see Universal Filters.

User Permissions¶

In order to collect metadata, the running user's permissions must allow them to access and read databases that need cataloging.

The user must have the following authorizations:

dataplex.entryGroups.list
dataplex.entryGroups.get
dataplex.entries.list
dataplex.entries.get

Data Extraction¶

To extract information, the connector runs the following request on the Google Dataplex API:

projects.locations.entryGroups.list
projects.locations.entryGroups.get
projects.locations.entries.list
projects.locations.entries.get

Collected Metadata¶

Inventory¶

Will collect the list of objects accessible by the user.

Dataset¶

A dataset is a Google Dataplex object.

Name
Source Description
Technical Data:
- Project Id
- Location Id
- Entry Id
- Created At
- Updated At
- Type

Field¶

Dataset field.

Name
Source Description
Type
Can be null: Depending on the field settings.
Multivalued: Not supported. Default value false.
Primary Key: Depending on the "Primary Key" field attribute.
Technical Data:
- Technical Name
- Native type

Unique Identifier Keys¶

Each object in the catalog is associated with a unique identifier key. When the object is imported from an external system, the key is generated and provided by the connector.

For more information about identifier keys, see Identification Keys.

Object	Identifier Key	Description
Dataset	`code/entry_group/project/dataset/table name`	- code: Unique identifier of the connection noted in the configuration file - entry_group: Dataplex Entry Group - project: Entry Project - dataset: Entry Dataset - table name: Entry Table name
Field	`code/entry_group/project/dataset/table name/field name`	- code: Unique identifier of the connection noted in the configuration file - entry_group: Dataplex Entry Group - project: Entry Project - dataset: Entry Dataset - table name: Entry Table name - field name: Field name