--- description: A complete reference for SemQL syntax and clauses --- # Core Syntax SemQL is syntactically compatible with PostgreSQL. If you know PostgreSQL, you already know most of SemQL. The differences are in what you reference — semantic model names instead of physical table names — not in how you write queries. This page covers every clause you'll use. *** ## SELECT Select the fields you want in the output. These can be dimensions, measures, aliases, literals, or expressions. ```sql -- Dimensions (attributes you slice by) SELECT name, region, created_at FROM customers -- Measures (pre-aggregated metrics) SELECT total_revenue, order_count FROM sales -- With model qualifier (use when field names are ambiguous) SELECT organizations.name, users.name FROM organizations JOIN users ... -- Aliases SELECT name AS organization_name, total_revenue AS revenue FROM organizations -- Literals SELECT name, 'active' AS status FROM organizations[active_only] ``` **When to qualify field names:** You only need to qualify (e.g. `organizations.name`) when the same field name exists in multiple tables you're querying. When it's unambiguous, just use the bare field name. *** ## FROM Reference a semantic model by name — no schema prefix, no physical table path. ```sql FROM organizations FROM sales FROM customers ``` ### Applying predefined filters Add reusable filter conditions directly in the `FROM` clause using square brackets: ```sql FROM organizations[active_only] FROM sales[last_quarter] FROM sales[last_quarter, high_value] -- multiple filters, all applied ``` Filters can also appear in `JOIN` clauses: ```sql FROM sales JOIN customers[enterprise] ON sales.customer_id = customers.customer_id ``` !!! important Filters go in `FROM` or `JOIN` — never in `SELECT`. See [Dimensions vs. Measures](dimensions-vs-measures.md) for more on filters. *** ## WHERE Filter rows based on dimension values, before any aggregation happens. ```sql SELECT name, total_revenue FROM sales WHERE region = 'Europe' SELECT name, order_count FROM orders WHERE created_at >= CURRENT_DATE - INTERVAL '30 days' AND status = 'completed' ``` **WHERE vs. HAVING:** Use `WHERE` to filter dimensions (raw values). Use `HAVING` to filter measures (aggregated results). See [Dimensions vs. Measures](dimensions-vs-measures.md) for the full explanation. *** ## GROUP BY Group results by one or more dimensions. Required whenever your query mixes dimensions with measures. ```sql SELECT region, total_revenue FROM sales GROUP BY region SELECT year, quarter, product_category, total_revenue, order_count FROM sales GROUP BY year, quarter, product_category ORDER BY year, quarter ``` *** ## HAVING Filter groups after aggregation. Use this to filter by measure values. ```sql SELECT region, total_revenue FROM sales GROUP BY region HAVING total_revenue > 100000 SELECT customer_id, order_count FROM orders GROUP BY customer_id HAVING order_count >= 5 ORDER BY order_count DESC ``` *** ## ORDER BY / LIMIT Standard SQL ordering and row limiting. ```sql SELECT name, total_revenue FROM organizations GROUP BY name ORDER BY total_revenue DESC LIMIT 10 -- Ascending order (default) SELECT name, created_at FROM customers ORDER BY created_at ASC ``` *** ## JOIN Join multiple models together. Use the relationship fields defined in your semantic layer — check the model definitions to find the correct join keys. ```sql -- Inner join SELECT organizations.name, users.email FROM organizations JOIN users ON organizations.organization_id = users.organization_id -- Left join (keep all rows from the left model) SELECT organizations.name, COUNT(users.user_id) AS user_count FROM organizations LEFT JOIN users ON organizations.organization_id = users.organization_id GROUP BY organizations.name -- With filters on joined models SELECT organizations.name, users.email FROM organizations[active_only] JOIN users[verified] ON organizations.organization_id = users.organization_id ``` !!! tip Use the relationships documented in your semantic layer to find the correct join columns. Don't assume — look up the exact field names. *** ## WITH (CTEs) Common Table Expressions let you break complex queries into named, reusable steps. Standard SQL `WITH` syntax works as-is. ```sql WITH revenue_by_region AS ( SELECT region, total_revenue FROM sales WHERE year = 2024 GROUP BY region ), top_regions AS ( SELECT region, total_revenue FROM revenue_by_region WHERE total_revenue > 500000 ) SELECT region, total_revenue FROM top_regions ORDER BY total_revenue DESC ``` !!! note Inside a CTE body, you're writing against semantic models normally. But once a CTE is defined, references to it are plain SQL — `region` in `top_regions` is just a column alias, not a semantic reference. *** ## UNION / UNION ALL Combine results from multiple queries. Both queries must return the same number of columns with compatible types. ```sql -- UNION removes duplicates; UNION ALL keeps them (faster) SELECT 'Q1' AS quarter, total_revenue FROM sales WHERE quarter = 1 UNION ALL SELECT 'Q2' AS quarter, total_revenue FROM sales WHERE quarter = 2 -- Comparing two segments side-by-side SELECT 'Enterprise' AS segment, customer_count FROM customers[enterprise] UNION ALL SELECT 'SMB' AS segment, customer_count FROM customers[smb] ``` *** ## Window Functions Window functions let you compute running totals, rankings, and period-over-period comparisons without collapsing rows into groups. ### Ranking ```sql SELECT region, total_revenue, ROW_NUMBER() OVER (ORDER BY total_revenue DESC) AS rank, RANK() OVER (ORDER BY total_revenue DESC) AS rank_with_ties, DENSE_RANK() OVER (ORDER BY total_revenue DESC) AS dense_rank FROM sales GROUP BY region ``` ### Running totals ```sql SELECT order_date, total_revenue, SUM(total_revenue) OVER ( ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS running_total FROM orders GROUP BY order_date ORDER BY order_date ``` ### Period-over-period comparisons ```sql SELECT order_date, total_revenue, LAG(total_revenue) OVER (ORDER BY order_date) AS prev_period_revenue, total_revenue - LAG(total_revenue) OVER (ORDER BY order_date) AS change FROM orders GROUP BY order_date ORDER BY order_date ``` ### Partitioned windows ```sql SELECT region, product_category, total_revenue, ROW_NUMBER() OVER ( PARTITION BY region ORDER BY total_revenue DESC ) AS rank_within_region FROM sales GROUP BY region, product_category ``` **Supported window functions:** `ROW_NUMBER`, `RANK`, `DENSE_RANK`, `NTILE`, `PERCENT_RANK`, `CUME_DIST`, `LEAD`, `LAG`, `FIRST_VALUE`, `LAST_VALUE`, `NTH_VALUE` *** ## Subqueries Standard SQL subqueries work in SemQL. ```sql SELECT name, total_revenue FROM organizations WHERE organization_id IN ( SELECT organization_id FROM orders WHERE created_at >= CURRENT_DATE - INTERVAL '90 days' GROUP BY organization_id HAVING COUNT(*) >= 3 ) GROUP BY name, total_revenue ``` *** ## What SemQL does NOT support * `DELETE`, `UPDATE`, `INSERT`, `DROP`, `TRUNCATE` — SemQL is read-only by design * Schema-qualified table names like `my_schema.my_table` — use model names instead * Direct `INFORMATION_SCHEMA` queries * Bypassing auto-applied model filters — they are always active *** ## Supported Functions ### Aggregate | Function | Description | | --------------------------------- | ---------------------------------- | | `COUNT(x)`, `COUNT(DISTINCT x)` | Count rows or unique values | | `SUM(x)` | Sum of values | | `AVG(x)` | Average | | `MIN(x)`, `MAX(x)` | Minimum / maximum | | `STDDEV(x)`, `VARIANCE(x)` | Standard deviation / variance | | `STRING_AGG(x, sep)` | Concatenate strings with separator | | `ARRAY_AGG(x)` | Aggregate into an array | | `PERCENTILE_CONT(p)`, `MEDIAN(x)` | Percentile calculations | | `BOOL_AND(x)`, `BOOL_OR(x)` | Boolean aggregations | | `CORR(x, y)` | Correlation coefficient | !!! warning Never wrap a predefined measure in an aggregate function. `SUM(total_revenue)` will double-aggregate. See [Dimensions vs. Measures](dimensions-vs-measures.md). ### Date / Time | Function | Description | | --------------------------- | ------------------------------- | | `DATE_TRUNC('month', date)` | Truncate to time period | | `DATE_PART('year', date)` | Extract part of a date | | `EXTRACT(YEAR FROM date)` | Extract part of a date | | `NOW()`, `CURRENT_DATE` | Current timestamp / date | | `CURRENT_TIMESTAMP` | Current timestamp with timezone | ### String | Function | Description | | -------------------------- | ------------------- | | `UPPER(x)`, `LOWER(x)` | Change case | | `LENGTH(x)` | String length | | `TRIM(x)` | Remove whitespace | | `SUBSTRING(x, start, len)` | Extract substring | | `CONCAT(x, y, ...)` | Concatenate strings | | `REPLACE(x, from, to)` | Replace substring | ### Math | Function | Description | | ------------------------------- | ------------------------------ | | `ABS(x)` | Absolute value | | `ROUND(x, n)` | Round to n decimal places | | `CEIL(x)`, `FLOOR(x)` | Round up / down | | `MOD(x, y)` | Modulo | | `POWER(x, y)`, `SQRT(x)` | Exponent / square root | | `LEAST(x, y)`, `GREATEST(x, y)` | Minimum / maximum of arguments | ### Geo | Function | Description | | ------------------------------------------------------ | -------------------------------------- | | `GEO_DISTANCE(lat1, lng1, lat2, lng2)` | Distance in km between two coordinates | | `GEO_WITHIN_RADIUS(lat1, lng1, lat2, lng2, radius_km)` | True/false: is point within radius? | See [Geo Functions](geo-functions.md) for full documentation.