Target | Protocol | Usual Ports |
|---|---|---|
AWS S3 | HTTPS | 443 |
Parameter | Expected Value |
|---|---|
name | The name that will be displayed to catalog users for this connection. |
code | Unique identifier of the connection on the Zeenea platform. Once registered on the platform, this code must not be modified or the connection will be considered as new and the old one removed from the scanner. |
connector_id | The type of connector to be used for the connection. Here, the value must be AmazonS3 and this value must not be modified. |
enabled | A boolean value to enable or disable the connection. |
catalog_code | The catalog code associated with the connection (default when empty). |
alias | The list of aliases used by other connectors to generate lineage link. |
secret_manager.enabled | Configuration for a secret manager. This configuration works only with Scanner 73 or later and requires a functional secret manager configured in the scanner configuration file. |
secret_manager.key | The name of the secret. |
connection.aws.access_key_id | AWS S3 access key identifier |
connection.aws.secret_access_key | AWS S3 secret access key |
connection.aws.region | AWS region |
s3.bucket_list | The list of buckets to be explored. The separator is space (" "). If this setting is left empty, the connector will explore all buckets accessible by the user. |
connection.url | Include this setting if you wish to use an S3 instance other than Amazon's. |
proxy.endpoint | Proxy port |
proxy.username | Proxy username |
proxy.password | Proxy account password |
Parameter | Expected Value |
|---|---|
inventory.strategy | Determines the algorithm used to discover datasets. It has two possible values: standard or legacy (default). - Legacy strategy is the algorithm that was used before version 59. - Standard strategy can manage cases where multiple datasets are in the same folder. |
inventory.file_partition_pattern | Regex used within the standard inventory strategy to define the variable part of a file name. |
inventory.partition | Regex to identify partition folders. |
inventory.skippedDirectory | Regex on the name of the folders to ignore while keeping the content taken into account. The content will be scanned as if it were at the root of the parent folder. No folder is ignored by default. |
inventory.ignoredDirectory | Regex on the name of the folders to ignore: their content will also be ignored. No folder is ignored by default. |
inventory.ignoredFile | Regex in the name of the files to ignore. Default value: \..* | _.* | .*\.crc |
inventory.extension.csv | For CSV files detection, default value: csv, tsv, csv.gz, tsv.gz, csv.zip, tsv.zip |
inventory.extension.parquet | For Parquet files detection, default value: parquet |
inventory.extension.avro | For Avro files detection, default value: avro |
inventory.extension.orc | For Orc files detection, default value: orc |
inventory.extension.xml | For Xml files detection, default value: xml, xml.gz, xml.zip |
inventory.extension.json | For Json files detection, default value: json, json.gz, json.zip |
inventory.csv.header | Used for configuring csv files header detection pattern. Select always to force recognizing the schema on the first line of csv files. Possible values are: never, default, always, and only string. The default value is default. |
cache.path | When inventoried buckets are very large and contain a lot of objects, the connector can consume a large amount of memory. It is possible to cache the objects list on disk to reduce memory consumption. To enable disk cache, just set the path to a file. (You can both set an absolute or a relative path, however, relative paths are dependant of the current directory at the scanner launch time). Examples: "/opt/zeenea-scanner/cache/s3.cache" "/var/lib/zeenea-scanner/s3.cache" """C:\zeenea-scanner\cache\s3.cache""" NOTE: When dealing with multiple S3 connections, pay attention to configure different cache files for each of your connections in order to avoid conflicts. |
xml.namespace_identification | Used for configuring XML fields identification. Value uri to use except to keep the compatibility with a scanner previous to version 43, it is necessary to use the value legacy (default value). |
xml.fields_ordering | Starting from version 67. Allows ordering the list of retrieved fields. Possible values are:
|
filter | To filter datasets during the inventory. See Rich Filters |
Object | Identification Key | Description |
|---|---|---|
Dataset | code/path/dataset name | - code: Unique identifier of the connection noted in the configuration file - path: Full path including the bucket - dataset name |
Field | code/path/dataset name/field name | - code: Unique identifier of the connection noted in the configuration file - path: Full path including the bucket - dataset name - field name |