Data Products
Introduction to Data Products and Data Contracts
A data product is a reusable, active, and standardized data asset designed to deliver measurable value to its users, whether internal or external, by applying the rigorous principles of product thinking and management. It comprises one or more data artifacts (e.g., datasets, models, pipelines) and is enriched with metadata, including governance policies, data quality rules, data contracts, and, where applicable, a software bill of materials (SBOM) to document its dependencies and components. Ownership of a data product is aligned to a specific domain or use case, ensuring accountability, stewardship, and its continuous evolution throughout its lifecycle. Adhering to FAIR principles (findable, accessible, interoperable, and reusable) a data product is designed to be discoverable, scalable, reusable, and aligned with both business and regulatory standards, driving innovation and efficiency in modern data ecosystems.
A data product embeds different types of components, in particular:
Input ports: An input port is a standardized interface through which a data product receives data from upstream sources. It defines how external data enters the data product. Input ports enable the controlled, traceable ingestion of data, facilitating lineage tracking and quality checks before the data is transformed and served to consumers.
Output ports: An output port is a standardized interface through which a data product exposes its data to consumers. It defines how the data can be accessed (e.g., via APIs, SQL tables, event streams), along with its format, schema, and protocols. Output ports ensure that data products are interoperable, discoverable, and easy to consume by other teams or systems while enforcing access controls and contracts. A data contract is associated with each output port.
Internal components: Internal components include datasets and processes that are not supposed to be consumed by end-users and are necessary to produce the output ports. These components are not represented in the current version of the platform.
A data contract is a formal agreement between a data product owner (also known as a producer) and its consumers that defines the structure, meaning, quality expectations, and access terms of the data exposed. It includes a schema definition. It can include data quality rules, Service Level Agreements (SLAs), ownership, rights, and more. In product-oriented data engineering and management, data contracts ensure reliable data consumption, prevent breaking changes, and promote accountability between domains
Data Products in the Actian Data Intelligence Platform
The Actian Data Intelligence Platform supports data products and data contracts natively. It enables organizations to manage, govern, and maximize the value of their data assets as products:
Define your data products and data contracts with YAML descriptors containing all relevant information for discovery and consumption (name, description, terms and conditions, custom properties, and so on).
Synchronize data products and their data contracts from your CI/CD pipelines by using our dedicated
Data Product API.
Manage data products and all their components in the Studio to enrich their documentation and publish them into the enterprise marketplace.
Search, find, and understand data products thanks to the graph-powered search engine and an optimized layout dedicated to the discovery of these new item types.
Request access to data products directly in Zeenea Explorer and manage these requests in Zeenea Studio to allow an efficient and governed consumption of the data products.
The following are the key benefits of implementing data products in the Actian Data Intelligence Platform:
Data products are supported as native items, allowing for modeling simple as well as more complex data products. Define one or several output ports for each data product to create more use-case-oriented data to better meet business user expectations.
A dedicated and optimized search experience, powered by the knowledge graph, enables users to efficiently search for, discover, understand, and consume data products.
By supporting data contracts, our platform encourages organizations in their efforts to shift left metadata management. Design data contracts upfront and integrate them in your CI/CD pipelines to ensure that business expectations from the data contract are met when you deploy new data. Moreover, synchronize your data contracts to keep metadata up to date.
Coupled with the federated catalog, each domain can design and manage its own data products.
Our platform breaks data silos and supports a data mesh approach by allowing domains publish their data products into the enterprise marketplace.
Create Data Products with the API
The Actian Data Intelligence Platform leverages these standards managed by
Bitol (a Linux Foundation project):
These YAML files can be uploaded to the platform through a dedicated REST API.
This API can be called from external tools or the CI/CD pipelines, for instance, from a GitHub Action, like in the following diagram:
In the current version, you cannot create data products from the Studio.
The following is a sample YAML file for a data product:
apiVersion: v1.9.0
kind: DataProduct
name: Yet Another Product
id: fbe8d147-28db-4f1d-bedf-a3fe9f458427
description:
purpose: Yet Another Product, with datasets from data contracts.
tags: ['experimental']
inputPorts:
- name: kafka_stock_topic
version: 1.0.0
contractId: dbb7b1eb-7628-436e-8914-2a00638ba6db
outputPorts:
- name: COVID-19
description: "COVID-19"
version: 1.0.0
contractId: f07a9a38-4020-415f-abd1-2802d6e77f19
customProperties:
- property: zeeneaGlossaryRefs
value: "KPI/Number of Delivered Doses of Vaccine"
inputContracts:
- id: dbb7b1eb-7628-436e-8914-2a00638ba6db
version: 2.0.0
The following is a sample YAML file for a data contract:
kind: DataContract
apiVersion: v3.0.2
version: 1.0.0
id: f07a9a38-4020-415f-abd1-2802d6e77f19
description:
purpose: Johns Hopkins University data on COVID-19 cases, Enigma
customProperties:
- property: zeeneaGlossaryRefs
value: "KPI/Number of Delivered Doses of Vaccine"
tags: ["kafka", "confluent", "aws", "managed"]
schema:
- name: covid_cases
physicalName: covid_cases
description: the number of confirmed covid cases reported for a specified region, with location and county/province/country information.
properties:
- name: fips
logicalType: string
description: state and county two digits code
- name: admin2
logicalType: string
description: county name
- name: province_state
logicalType: string
description: province name or state name
- name: country_region
logicalType: string
description: country name or region name
- name: last_update
logicalType: date
description: last update timestamp
- name: latitude
logicalType: number
description: location (latitude)
- name: longitude
logicalType: number
description: location (longitude)
- name: confirmed
logicalType: int
description: number of confirmed cases
- name: combined_key
logicalType: string
description: county name+state name+country name
Custom Properties
Actian Data Intelligence Platform supports several custom properties in the data product and data contract:
zeeneaGlossaryRefs - zeeneaGlossaryRefs are used to link existing glossary items (specify the glossary item unique key) with the item. Values from the descriptor are added to existing ones.
Example:
zeeneaGlossaryRefs: - business-object/search
zeeneaCustomItemRefs - zeeneaCustomItemRefs are used to link existing custom items (specify the custom item unique key) with the item. Values from the descriptor are added to existing ones.
Example:
zeeneaCustomItemRefs: - domain/search
Manage Data Product Documentation in Studio
Data Product and Output Port Templates
Data products and output ports are represented as two built-in item types. You can manage their templates and responsibilities as any other item type in the Catalog Design section. It allows curators to provide metadata in addition to that harvested from the source with the API.
You can also configure data products to implement glossary items in the Glossary metamodel section. Output ports cannot implement glossary items.
Data Product and Output Port Attributes
Data Product Attributes
General Information
Data products have common attributes as follows:
Name / Source name
Description / Source description
Properties / Source properties
Contacts / Source contacts
Glossary items
Links with custom items
Catalog
In the Studio, data product attributes can be updated from their side panel and details page, as well as using bulk actions and file import.
Input Ports Tab
In the Input ports tab of a data product, all the sources consumed by the data product's input ports are listed. Input ports themselves are displayed only in the lineage.
Output Ports Tab
In the Output ports tab of a data product, all the output ports to be consumed by the end-users are listed.
Data Quality Status
The data quality status of a data product is calculated from the data quality status of its output ports.
Attachments
From the Explorer, you can download the YAML descriptor of the data product.
Output Port Attributes
General Information
Output ports have common attributes as follows:
Name / Source name
Description / Source description
Properties / Source properties
Contacts / Source contacts
Links with custom items
Catalog (inherited from the parent data product)
Datasets Tab
The Datasets tab lists all the datasets and their fields that compose the output port.
Data Model Tab
The Data model tab shows the relations between the datasets of the output ports.
Note: These links can be specified through the YAML descriptors or through the Catalog API.
Data Quality Tab and Status
The Data quality tab lists all the checks that have been performed for all the datasets that compose the output port. For each check, a link allows the user to open the side panel of the dataset this check refers.
The data quality status of the output port is calculated from the quality status of its datasets.
Attachments
From the Explorer, you can download the YAML descriptor of the data contract attached to this output port.
Delete a Data Product
You can delete a data product from the Studio. When you delete a data product, its output ports and the datasets that compose them are deleted automatically.
Share a Data Product in Marketplace
Data products can be shared in the marketplace when the federated catalog option is activated. When you share a data product, its output ports and the datasets that compose them are shared automatically.
Move a Data Product to Another Catalog
From the Studio, you can move a data product to another catalog when the federated catalog option is activated. When you move a data product, its output ports and the datasets that compose them are moved automatically.
Search for Data Products
In the Explorer and the Studio, you can search data products by their own attributes or those of their output ports. Output ports and their embedded datasets are not displayed in search results.
Request Access to a Data Product
An Explorer user can request access to data products at the output port level.
You can enable access requests for data product output ports in Administration. Curators must then activate the feature for each output port, just as they do for datasets and visualizations.