Skip to content

Data Quality Score

Overview

The Data Quality (DQ) Score is a normalized, business-relevant measure of data health that provides a single indicator of a dataset's fitness for use. Expressed as a percentage from 0% to 100%, the DQ Score helps organizations quickly assess and monitor the overall quality of their data assets.

Purpose

The DQ Score methodology ensures:

  • Consistency - Standardized measurement across all data assets
  • Normalization - Comparable scores regardless of data volume or complexity
  • Business Relevance - Weighted dimensions that reflect organizational priorities
  • Actionability - Clear identification of data quality issues requiring attention

Core Dimensions

The DQ Score is calculated as a weighted average of four fundamental data quality dimensions:

Completeness

Measures the extent to which required data fields are populated.

Metric: Percentage of required fields containing values\ Score Range: 0-100%

Validity

Measures compliance with defined business rules and data constraints.

Metric: Percentage of records passing validation rules\ Score Range: 0-100%

Freshness (Timeliness)

Measures whether data is up-to-date and meets timeliness requirements.

Metric: Binary indicator of freshness incidents\ Score Range: 0 or 100

Integrity (Incident Health)

Measures operational stability through the volume of open data quality incidents.

Metric: Count of open incidents relative to threshold\ Score Range: 0-100

Score Calculation

Normalization Process

All source metrics are normalized to scores between 0 and 100 before being integrated into the final DQ Score calculation.

Completeness (S_C)

S_C = Completeness percentage

Directly uses the percentage of populated required fields.

Validity (S_V)

S_V = Validity percentage

Directly uses the percentage of records passing business rules.

Freshness (S_F)

If No Freshness Incidents: S_F = 100
If Freshness Incidents Exist: S_F = 0 (or configured penalty score, e.g., 80)

Integrity/Incidents (S_Inc)

S_Inc = MAX(0, 100 × (1 - (I_current / I_max)))

Where:
- I_current = Number of open incidents
- I_max = Maximum tolerable incident threshold (default: 20)

Constraints:

  • If I_current ≥ I_max, then S_Inc = 0
  • If I_current = 0, then S_Inc = 100

Final DQ Score Formula

DQ Score = ((S_C × W_C) + (S_V × W_V) + (S_F × W_F) + (S_Inc × W_Inc)) / W_Total

Where:
- S = Normalized dimension score (0-100)
- W = Dimension weight
- W_Total = W_C + W_V + W_F + W_Inc

Output: Value between 0 and 100

Default Weights

Data Observability provides industry-standard default weights that prioritize data accuracy and fundamental usability:

Dimension Default Weight Rationale
Validity 40% Highest priority - measures compliance with critical business rules
Completeness 30% Second priority - measures availability of required information
Integrity (Incidents) 20% High priority penalty - reflects operational stability and issue volume
Freshness 10% Contextual priority - importance varies by use case
TOTAL 100% Simplifies calculation denominator

Weight Customization

Weights can be adjusted per dataset to reflect specific business requirements:

  • Real-time systems: Increase Freshness weight (e.g., 25-30%)
  • Analytical systems: Prioritize Completeness and Validity
  • Mission-critical systems: Increase Integrity/Incidents weight

Configuration

Dimension Weights

The system allows dynamic, per-dataset configuration of dimension weights:

  1. Navigate to your dataset settings
  2. Select Data Quality Score Configuration
  3. Adjust weights to match business priorities
  4. Document justification for non-default weights

Note: All four weights must sum to 100.

Incident Threshold (I_max)

Configure the maximum tolerable incident threshold per dataset:

  • Default: 20 open incidents
  • Low-tolerance assets: 5-10 incidents
  • High-volume assets: 30-50 incidents

The threshold should reflect:

  • Dataset criticality
  • Typical incident volumes
  • Business impact tolerance

Best Practices

Interpreting DQ Scores

Score Range Quality Level Recommended Action
90-100 Excellent Maintain current practices
75-89 Good Monitor trends, address minor issues
60-74 Fair Investigate dimension contributors, plan improvements
0-59 Poor Immediate attention required, escalate issues

To start using this feature, please refer to DQ Score APIs