DC 12.4 | Example: Data Quality Dimensions

User Guide > Designing and Executing Data Profile > Data Quality Dimensions > Example: Data Quality Dimensions

Was this helpful?

Example: Data Quality Dimensions

To illustrate how dimensions provide greater visibility into data quality issues, let’s compare profile execution results for a single dataset with the same rules applied: one where data quality dimensions are not configured, and another where they are configured.

Our dataset contains customer account information, including name, address, email and account number. We want to know what data is missing, duplicated, and inconsistently formatted. To get answers to these initial questions, we apply the:

• IsNotBlank and IsNotNull rule types against all fields in the dataset

• MatchesRegex rule type against the Zip and Email fields

• IsNotDuplicate rule type against the Email, Company and Account Number fields

Note: When you compare the two profile execution results, you’ll notice that the Data Quality Index (DQI) scores differ slightly. This is because when no dimensions are configured the Data Quality Index score is equal to the percentage of records that passed their rule criteria. And when dimensions are configured it's equal to the average of all the dimension scores.

Without Dimensions

When data quality dimensions are not configured, post profile execution results display a long list of all executed rules and their pass/fail results (see figure, below).

To see which records are missing, duplicated, and inconsistently formatted we must look at each rule. For example, we see individual results for each field the rule is configured against (EmailIsNotBlank, StreetIsNotBlank, ZipIsNotBlank, and so forth). We cannot see the results as a whole (for all the fields the rule is applied to).

With Dimensions

As with configuring rules, configuring data quality dimensions is an iterative process. To begin configuring dimensions, we organize the rules into three dimensions:

• Completeness: IsNotBlank and IsNotNull rules – Both of these rules find fields with missing values

• Uniqueness: IsNotDuplicate rules – This rule finds non-unique values

• Validity: MatchesRegex rules – This rule finds values that don’t conform to the specified format

When dimensions are configured, post profile execution results display the same information as when dimensions are absent. However, additional information is immediately visible. We learn what data quality issues exist. In this case, there are issues with completeness, uniqueness and validity (see figure, below).

We also learn how prevalent each issue is. The Dimension Score indicates the success rate for all the rules in each dimension as a whole:

• Uniqueness is the most prevalent issue (duplicate records) – 87.4% pass rate

• Validity is next largest issue (wrong formats) – 94.26% pass rate

• Completeness also needs to be addressed (missing information) – 98.8% pass rate

We immediately see the results as a whole (for all the fields the rule is applied to). To see which records are missing, duplicated, and inconsistently formatted we look at the Completeness dimension, Uniqueness dimension and Validity dimension (respectively). We then click on a dimension to see pass/fail results for each rule, and click a rule bar graph (in the right panel) to get a list of records.

The results also indicate what actions we need to take to improve the data. In this case, we need to create rules to:

• deduplicate data in the Email, Company and Account Number fields

• fix formats in Zip and Email fields

• populate missing data for various fields

Last modified date: 01/08/2026