Dimensions, Measures & Filters¶

Understanding the difference between dimensions and measures is the single most important thing to know about SemQL. Get this right and most queries just work.

Dimensions¶

A dimension is an attribute you slice, filter, or group by. Think: names, dates, categories, IDs, statuses.

-- Examples of dimensions
name
region
created_at
product_category
customer_id

Dimensions map to plain columns in your database, or sometimes to expressions (e.g., full_name might be CONCAT(first_name, ' ', last_name) under the hood — you don't need to know).

You use dimensions in:

SELECT (to show the attribute)
WHERE (to filter before aggregation)
GROUP BY (to group rows)
ORDER BY (to sort)

Measures¶

A measure is a pre-defined, pre-aggregated metric. Think: revenue, user counts, order totals, conversion rates.

-- Examples of measures
total_revenue         -- might be SUM(amount - refunds)
user_count            -- might be COUNT(DISTINCT user_id)
average_order_value   -- might be AVG(amount) WHERE amount > 0
conversion_rate       -- might be SUM(conversions) / COUNT(sessions)

Measures are defined once by your data team in the semantic layer. They encode all the business logic: which columns to use, how to handle nulls, what to exclude, how to aggregate.

You use measures in:

SELECT (to show the metric)
HAVING (to filter after aggregation)
ORDER BY (to sort by the metric)

The Golden Rule: Never Aggregate a Measure¶

This is the most common SemQL mistake, and it produces silent, wrong results.

Measures are already aggregated. When you write total_revenue, it already contains a SUM() inside its definition. If you wrap it in another SUM(), you get double aggregation — which either errors out or returns garbage.

-- WRONG: total_revenue is already SUM(amount - refunds)
-- This becomes SUM(SUM(amount - refunds)) — double aggregation
SELECT region, SUM(total_revenue)
FROM sales
GROUP BY region

-- CORRECT: just reference the measure directly
SELECT region, total_revenue
FROM sales
GROUP BY region

Same rule applies to COUNT, AVG, MIN, MAX, and any other aggregate:

-- WRONG
SELECT product, AVG(average_rating) FROM products GROUP BY product

-- CORRECT
SELECT product, average_rating FROM products GROUP BY product

How to tell if something is a measure: Check your semantic layer. Measures are tagged as such. When in doubt, if a field sounds like a business metric (revenue, count, rate, total, average), it's probably a measure.

WHERE vs. HAVING¶

Since dimensions and measures behave differently, they use different filter clauses:

	WHERE	HAVING
What it filters	Dimensions (raw column values)	Measures (aggregated results)
When it runs	Before aggregation	After aggregation
Use for	`status = 'active'`, `date >= ...`	`total_revenue > 10000`, `order_count >= 5`

-- WHERE: filter by a dimension (region is a dimension)
SELECT region, total_revenue
FROM sales
WHERE region = 'Europe'
GROUP BY region

-- HAVING: filter by a measure (total_revenue is a measure)
SELECT region, total_revenue
FROM sales
GROUP BY region
HAVING total_revenue > 500000

-- Both together
SELECT region, total_revenue
FROM sales
WHERE created_at >= '2024-01-01'   -- dimension filter: rows to include
GROUP BY region
HAVING total_revenue > 100000       -- measure filter: groups to keep
ORDER BY total_revenue DESC

A simple test: if you could put the condition in WHERE without a GROUP BY, it filters a dimension. If it only makes sense after grouping (because it involves an aggregated value), it belongs in HAVING.

Predefined Filters¶

Predefined filters are reusable WHERE conditions defined in the semantic layer. Instead of writing WHERE member_count > 0 everywhere, your data team defines it once as active_only. Anyone who uses organizations[active_only] gets exactly that condition — consistently.

Syntax¶

Filters go in the FROM or JOIN clause, in square brackets after the model name:

-- Single filter
FROM organizations[active_only]

-- Multiple filters (all conditions applied together)
FROM sales[last_quarter, high_value]

-- Filter on a joined model
FROM sales
JOIN customers[enterprise] ON sales.customer_id = customers.customer_id

Filters are not just convenience — they're consistency¶

Imagine "enterprise customer" means annual_contract_value >= 50000 AND plan = 'enterprise'. Without predefined filters, every analyst writes that condition slightly differently. Some forget the plan check. Some use different thresholds.

With customers[enterprise], there's one definition. Every query, every agent, every analyst uses the same criteria.

Auto-applied model filters¶

Some models have filters that apply automatically to every query, without you having to specify them. These are guardrails set by your data team.

For example, if the transactions model has an auto-filter for status = 'completed', then FROM transactions always excludes incomplete transactions — even if you don't ask for it. You can't bypass them, and you shouldn't need to. They exist to prevent mistakes.

Filters don't belong in SELECT¶

This is a syntax error:

-- WRONG: filters in SELECT
SELECT organizations[active_only].name FROM organizations

-- CORRECT: filters in FROM
SELECT name FROM organizations[active_only]

Quick Reference¶

Concept	Where to use	Example
Dimension	SELECT, WHERE, GROUP BY, ORDER BY	`region`, `created_at`, `name`
Measure	SELECT, HAVING, ORDER BY	`total_revenue`, `user_count`
Predefined filter	FROM, JOIN (in square brackets)	`[active_only]`, `[last_quarter]`
Aggregate function on dimension	SELECT with GROUP BY	`COUNT(order_id)`
Aggregate function on measure	Never — already aggregated	—
Dimension filter	WHERE	`WHERE region = 'EU'`
Measure filter	HAVING	`HAVING total_revenue > 0`