Parameter | Expected Value |
|---|---|
name | The name that will be displayed to catalog users for this connection. |
code | Unique identifier of the connection on the Zeenea platform. Once registered on the platform, this code must not be modified or the connection will be considered as new and the old one removed from the scanner. |
connector_id | The type of connector to be used for the connection. Here, the value must be ADLSGen1 and this value must not be modified. |
connection.account_fqdn | Fully Qualified Domain Name (FQDN) of the Azure Data Lake Store account |
connection.oauth.client_id | Application ID |
connection.oauth.endpoint | Endpoint obtained from the configuration menu of your Azure account. Azure Active Directory > App Registration > Endpoints > OAuth 2.0 token endpoint (v1) Example: https://login.microsoftonline.com/c802e70e-9ed0-11ec-9163-00155d15055c/oauth2/token |
connection.oauth.client_secret | Client secret |
filter | To filter datasets during the inventory. See Rich Filters. |
Parameter | Expected Value |
|---|---|
name | The name that will be displayed to catalog users for this connection |
code | Unique identifier of the connection on the Zeenea platform. Once registered on the platform, this code must not be modified or the connection will be considered as new and the old one removed from the scanner. |
connector_id | The type of connector to be used for the connection. Here, the value must be ADLSGen2 and this value must not be modified. |
connection.account_name | Account name |
connection.account_key | Account Key; can be retrieved in the Access Key section of the Azure menu |
connection.container_name | List of containers to browse, separated by spaces |
connection.oauth.tenant_id | Tenant ID as defined in Azure |
connection.oauth.client_id | Application ID (client) as defined in Azure |
connection.oauth.client_secret | Client secret |
filter | To filter datasets during the inventory |
Parameter | Expected Value |
|---|---|
inventory.partition | Regex to identify partition folders |
inventory.skippedDirectory | Regex on the name of the folders to ignore while keeping the content taken into account. The content will be scanned as if it were at the root of the parent folder. No folder is ignored by default. |
inventory.ignoredDirectory | Regex on the name of the folders to ignore: their content will also be ignored. No folder is ignored by default. |
inventory.ignoredFile | Regex in the name of the files to ignore. Default value: "\..* | _.* | .*\\.crc" |
inventory.extension.csv | For CSV files detection. Default value: "csv, tsv, csv.gz, tsv.gz, csv.zip, tsv.zip" |
inventory.extension.parquet | For Parquet files detection. Default value: parquet. |
inventory.extension.avro | For Avro files detection. Default value: avro. |
inventory.extension.orc | For Orc files detection. Default value: orc. |
inventory.extension.xml | For Xml files detection. Default value: xml, xml.gz, xml.zip. |
inventory.extension.json | For Json files detection. Default value: json, json.gz, json.zip. |
inventory.csv.header | Used for configuring csv files header detection pattern. Use always to force recognizing the schema on the first line of csv files. Possible values are: never, always, and only string. |
xml.namespace_identification | Used for configuring XML fields identification. Use uri, except to keep the compatibility with a scanner previous to version 43, where it is necessary to use the value legacy (default value). |
xml.fields_ordering | Starting from version 67. Allows ordering the list of retrieved fields. Possible values are:
|
Object | Identification Key | Description |
|---|---|---|
Dataset | code/path/dataset name | - code: Unique identifier of the connection noted in the configuration file - path: Full path including the container name - dataset name |
Field | code/path/dataset name/field name | - code: Unique identifier of the connection noted in the configuration file - path: Full path including the container name - dataset name - field name |