Aggregate Functions
Aggregate functions return a single value, calculated from values in a column. They take a set of values as their argument. Aggregate functions are also called set functions.
Basic Aggregate Functions
A basic aggregate function returns a single value by aggregating mulitple rows of data.
The syntax of a basic aggregate function is as follows:
function_name ([DISTINCT | ALL] expr)
where:
function_name denotes an aggregate function.
expr denotes any expression that does not include an aggregate function reference (at any level of nesting).
DISTINCT eliminates duplicate values of expr within a set of input rows to the aggregation. ALL (the default) retains duplicate values of expr.
Null values of expr are ignored by the aggregate functions, with the exception of COUNT(*).
The following example uses the SUM aggregate function to calculate the total of salaries for employees in department 23:
SELECT SUM (employee.salary)
FROM employee
WHERE employee.dept = 23;
ANY
ANY
Result type: INTEGER
Returns 1 if any row in the table fulfills the where clause, or 0 if no rows fulfill the WHERE clause.
AVG
AVG
Result type: FLOAT, MONEY, DATE (INTERVAL only)
Returns the average (sum / count). The sum of the values must be within the range of the result data type.
COUNT
COUNT
Result type: INTEGER8
Returns count of non-null occurrences.
For example, count the number of employees allocated to a department:
SELECT COUNT(dept)
FROM employee;
Note: Returns INTEGER4 when used against Ingres tables.
COUNT_BIG is a synonym for COUNT.
COUNT(*)
COUNT(*)
Result type: INTEGER8
Returns count of the number of rows in a result table, including rows that contain nulls.
For example, the following statement counts the number of employees in department 'PERS':
SELECT COUNT(*)
FROM employee
WHERE dept = 'PERS';
The asterisk (*) argument cannot be qualified with ALL or DISTINCT.
Because COUNT(*) counts rows rather than columns, it does not ignore nulls. Consider the following table:
NAME EXEMPTIONS
Smith 0
Jones 2
Tanghetti 4
Fong Null
Stevens Null
The expression:
COUNT(exemptions)
returns the value of 3, whereas:
COUNT(*)
returns 5.
If the argument to the COUNT function evaluates to an empty set, COUNT returns zero, which is in contrast to other functions, which return a null.
GROUPING
GROUPING
Result type: INTEGER
Returns an integer value based on its parameters.
It can be used to simplify a query that needs many GROUP BY levels by letting you more easily express row filtering conditions.
The GROUPING function can have multiple parameters, each of which must be a column or expression in the GROUP BY clause of an aggregate query. The parameters from right to left are represented in the function result by bits from right to left.
When CUBE, ROLLUP, or GROUPING SETS syntax is used, result rows from the query can be formed based on the grouping of rows according to different combinations of the grouping columns or expressions. If a result row is based on a grouping that includes a parameter of the grouping function, its bit position in the GROUPING() function result is 0, otherwise 1. This helps you determine which groupings each of the result rows comes from.
For example:
SELECT sno, pno, SUM(qty), GROUPING(sno, pno) FROM sp
GROUP BY CUBE(sno, pno)
produces results like:
s1 p1 25 0
s1 p2 30 0
...
s1 100 1
s2 50 1
...
p1 95 2
p2 10 2
...
1000 3
The numbers 0, 1, 2, and 3 are the results of the GROUPING() function. Rows with both sno and pno have a GROUPING() value of 0 because they result from the grouping of both sno and pno. Rows with only sno (superaggregate rows grouped only on sno) have a value of 1 because the rightmost parameter (pno) does not contribute to the grouping. Rows with only pno (superaggregates grouped only on pno) have a value of 2 because the next parameter (sno—in bit position 2) does not contribute. And the last row has a value of 3 because neither sno nor pno are involved in the grouping.
MAX
MAX
Result type: Same as argument
Returns maximum value.
MEDIAN
MEDIAN(expr)
Operand type: All numeric types and ANSIDATE
Result type: Same as argument
Returns an expression that is the middle of a sorted list of the input expressions. Nulls are ignored. If the number of input expressions is odd, the result is from the expression list. If it is even, the result is the interpolation between the two middle expressions, rounding up for integer expressions.
Note: The DISTINCT and ALL keywords are not supported.
For example:
SELECT * FROM temperatures
+--------------------+-------------+-----+
|city |station |temp |
+--------------------+-------------+-----+
|Los Angeles | 1| 71.5|
|Los Angeles | 2| 72.5|
|Los Angeles | 3| 73.5|
|Los Angeles | 4| 74.5|
|Seattle | 1| 62.5|
|Seattle | 2| 64.5|
|Seattle | 3| 64.5|
|Seattle | 4| 65.5|
|Seattle | 5| 66.5|
+--------------------+-------------+-----+
(9 rows)
SELECT city, MEDIAN (temp) AS median_temp FROM temperatures GROUP BY city
+--------------------+-----------------------------------------+
|city |median_temp |
+--------------------+-----------------------------------------+
|Los Angeles | 73.00|
|Seattle | 64.50|
+--------------------+-----------------------------------------+
(2 rows)
MIN
MIN
Result type: Same as argument
Returns minimum value.
MODE
MODE(expr)
Operand type: All numeric types, char, varchar, date and time types, Boolean, UUID, IPV4, IPV6
Result type: Same as argument
Returns the most common value of a column of a table, or if used in an aggregation, the most common value of a non-grouping column per group. When there are multiple common values with the same frequency, only one is returned.
Null values are ignored. When there are no rows or all the values are NULL, NULL is returned. However, null values in GROUP BY columns are not ignored and are treated as a single value for grouping.
Note: The DISTINCT and ALL keywords are not supported.
For example:
SELECT MODE(color) FROM cars;
SELECT brand,MODE(color) FROM cars GROUP BY brand;
PERCENTILE_CONT
PERCENTILE_CONT(n) WITHIN GROUP(ORDER BY expr)
Operand type: All numeric types and ANSIDATE
Result type: Same as argument
Returns a value that corresponds to the given fraction (n) in the sort order, where 0 <= n <= 1. Sorts the input expressions and returns a single expression that is greater than n of the input expressions. Nulls are ignored. For example, if n is .8, the aggregate result will be >= 80% of the input values. The result may be an interpolated value if no input exactly delineates the percentile, rounding up for integer expressions.
PERCENTILE_CONT (.5)... is the same as MEDIAN.
Note: The DISTINCT and ALL keywords are not supported.
For example:
SELECT city, PERCENTILE_CONT(.75) WITHIN GROUP (ORDER BY temp) AS seventy_fifth_percentile FROM temperatures GROUP BY city
+--------------------+-----------------------------------------+
|city |seventy_fifth_percentile |
+--------------------+-----------------------------------------+
|Los Angeles | 73.80|
|Seattle | 65.50|
+--------------------+-----------------------------------------+
(2 rows)
STDDEV_POP
STDDEV_POP
Result type: FLOAT
Computes the population form of the standard deviation (square root of the population variance of the group).
STDDEV_SAMP
STDDEV_SAMP
Result type: FLOAT
Computes the sample form of the standard deviation (square root of the sample variance of the group).
SUM
SUM
Result type: INTEGER, FLOAT, MONEY, DATE (INTERVAL only)
Returns column total.
VAR_POP
VAR_POP
Result type: FLOAT
Computes the population form of the variance (sum of the squares of the difference of each argument value in the group from the mean of the values, divided by the count of the values).
VAR_SAMP
VAR_SAMP
Result type: FLOAT
Computes the sample form of the variance (sum of the squares of the difference of each argument value in the group from the mean of the values, divided by the count of the values minus 1).
Regression and Correlation Analysis Aggregate Functions
The following aggregate functions perform a variety of regression and correlation analysis.
Syntax is as follows:
function_name(indep_parm, dep_parm)
where function_name denotes the function name and the first argument is the independent variable and the second argument is the dependent variable.
Functions for regression and correlation analysis are as follows:
CORR
CORR(indep_parm, dep_parm)
Result type: FLOAT
Returns correlation coefficient (ratio of the population covariance divided by the product of the population standard deviation of the independent variable and the population standard deviation of the dependent variable).
COVAR_POP
COVAR_POP(indep_parm, dep_parm)
Result type: FLOAT
Returns population covariance (sum of the products of the difference of the independent variable from its mean, times the difference of the dependent variable from its mean, divided by the number of rows).
COVAR_SAMP
COVAR_SAMP(indep_parm, dep_parm)
Result type: FLOAT
Returns sample covariance (sum of the products of the difference of the independent variable from its mean, times the difference of the dependent variable from its mean, divided by the number of rows minus 1).
REGR_AVGX
REGR_AVGX(indep_parm, dep_parm)
Result type: FLOAT
Returns average of the independent variables.
REGR_AVGY
REGR_AVGY(indep_parm, dep_parm)
Result type: FLOAT
Returns average of the dependent variables.
REGR_COUNT
REGR_COUNT(indep_parm, dep_parm)
Result type: INTEGER
Returns count of rows with non-null values for both dependent and independent variables.
REGR_INTERCEPT
REGR_INTERCEPT(indep_parm, dep_parm)
Result type: FLOAT
Returns Y-intercept of the least-squares-fit linear equation determined by the (independent variable, dependent variable) pairs.
REGR_R2
REGR_R2(indep_parm, dep_parm)
Result type: FLOAT
Returns square of the correlation coefficient.
REGR_SLOPE
REGR_SLOPE(indep_parm, dep_parm)
Result type: FLOAT
Returns slope of the least-squares-fit linear equation determined by the (independent variable, dependent variable) pairs.
REGR_SXX
REGR_SXX(indep_parm, dep_parm)
Result type: FLOAT
Returns the sample corrected sum of the squares of the independent variable.
REGR_SXY
REGR_SXY(indep_parm, dep_parm)
Result type: FLOAT
Returns sum of the product of the independent variable and the dependent variable.
REGR_SYY
REGR_SYY(indep_parm, dep_parm)
Result type: FLOAT
Returns the sample corrected sum of the squares of the dependent variable.
String Aggregate Functions
String aggregate functions concatenate a set of string values.
LISTAGG
LISTAGG ([DISTINCT] value_expr [, 'delimiter'])
[WITHIN GROUP (order-by-clause)]
Concatenates expressions within a group.
Result type: VARCHAR(4000). Longer strings are truncated. The length of the result can be configured using the [engine] listagg_group_maxlen parameter.
where:
DISTINCT
Eliminates duplicate values.
value_expr
Specifies an expression that can appear in the select list of a query. Can contain constants, row values, operators, scalar functions, and scalar subqueries. Null values in the value_expr column are ignored. The value is cast to char data type before being concatenated.
'delimiter'
Defines the separator between concatenated items. The default separator is the empty string.
WITHIN GROUP (order-by-clause)
Specifies how the items in the result should be sorted.
The following LISTAGG examples are based on this table:
DEPT EMPNO FIRSTNAME LASTNAME
----------- ---------- ---------- ----------
Marketing 101 Douglas Cray
Marketing 103 Dong Luang
Marketing 105 Dennis Indolay
Admin 107 Sherry Keller
Admin 109 Carl Nader
Admin 111 Ruth Turret
Sales 113 Andrew Bonnet
Sales 115 Oscar Hender
Sales 117 Justin Braushere
Sales 119 Smith
Concatenate the names of all employees, ordered by last name (LISTAGG used as a simple aggregate):
SELECT LISTAGG(lastname)
WITHIN GROUP (ORDER BY lastname) AS PERSONNEL
FROM employee;
PERSONNEL
-----------------------------------------------------------
BonnetBraushereCrayHenderIndolayKellerLuangNaderSmithTurret
Same as previous query, but use a delimiter to separate names with a comma:
SELECT LISTAGG(lastname, ',')
WITHIN GROUP (ORDER BY lastname) AS PERSONNEL
FROM employee;
PERSONNEL
--------------------------------------------------------------------
Bonnet,Braushere,Cray,Hender,Indolay,Keller,Luang,Nader,Smith,Turret
Concatenate employees in each department, ordered by last name (LISTAGG used as a regular aggregate with GROUP BY):
SELECT dept, LISTAGG(lastname, ',')
WITHIN GROUP (ORDER BY lastname) AS PERSONNEL
FROM employee GROUP BY dept;
DEPT PERSONNEL
----------- ------------------------------
Admin Keller,Nader,Turret
Marketing Cray,Indolay,Luang
Sales Bonnet,Braushere,Hender,Smith
Concatenate employee names in the same department, ordered by last name, partitioned by department (LISTAGG used as a windowing aggregate):
SELECT dept, firstname, lastname, LISTAGG(lastname, ',')
WITHIN GROUP(ORDER BY lastname)
OVER (PARTITION BY dept) AS PERSONNEL
FROM employee ORDER BY dept, lastname, firstname;
DEPT FIRSTNAME LASTNAME PERSONNEL IN SAME DEPT
----------- ---------- ---------- ------------------------------
Admin Sherry Keller Keller,Nader,Turret
Admin Carl Nader Keller,Nader,Turret
Admin Ruth Turret Keller,Nader,Turret
Marketing Douglas Cray Cray,Indolay,Luang
Marketing Dennis Indolay Cray,Indolay,Luang
Marketing Dong Luang Cray,Indolay,Luang
Sales Andrew Bonnet Bonnet,Braushere,Hender,Smith
Sales Justin Braushere Bonnet,Braushere,Hender,Smith
Sales Oscar Hender Bonnet,Braushere,Hender,Smith
Sales Smith Bonnet,Braushere,Hender,Smith
Ordering Aggregate Functions
Ordering aggregate functions order the aggregated rows.
FIRST_VALUE WITHIN GROUP and LAST_VALUE WITHIN GROUP
FIRST_VALUE (scalar_value_expression) [null_specification] WITHIN GROUP (order_by_clause)...[group_by_clause]
LAST_VALUE (scalar_value_expression) [null_specification] WITHIN GROUP (order_by_clause)...[group_by_clause]
Produces one row per group, or if there is no group_by_clause, a single row
where:
scalar_value_expr
Specifies any scalar expression that can appear in the select list of a query. Can contain constants, row values, operators, scalar functions, and scalar subqueries.
null specification
Indicates whether to include null values in the selection of a first_value or last_value. Valid values are:
RESPECT NULLS
(Default) Uses the expression value in the first or last row.
IGNORE NULLS
Uses the first or last non-null expression value.
The following query shows the employee number (empno) of the highest paid employee in each department:
SELECT deptno, FIRST_VALUE(empno)
WITHIN GROUP (ORDER BY sal DESC) AS "FIRST VALUE"
FROM emp GROUP BY deptno;
DEPTNO FIRST VALUE
---------- -----------
100 839
200 788
300 900
Aggregate Functions and Decimal Data
Given decimal arguments, aggregate functions (except for COUNT) return decimal results.
The following table explains how to determine the scale and precision of results returned for aggregates with decimal arguments:
GROUP BY Clause with Aggregate Functions
The GROUP BY clause allows aggregate functions to be performed on subsets of the rows in the table. The subsets are defined by the GROUP BY clause.
For example, the following statement selects rows from a table of political candidates, groups the rows by party, and returns the name of each party and the average funding for the candidates in that party.
SELECT party, AVG(funding)
FROM candidates
GROUP BY party;
Restrictions on the Use of Aggregate Functions
The following restrictions apply to the use of aggregate functions:
• Aggregate functions cannot be nested.
• Aggregate functions can be used only in SELECT or HAVING clauses.
• If a SELECT or HAVING clause contains an aggregate function, columns not specified in the aggregate must be specified in the GROUP BY clause. For example:
SELECT dept, AVG(emp_age)
FROM employee
GROUP BY dept;
The above SELECT statement specifies two columns, dept and emp_age, but only emp_age is referenced by the aggregate function, AVG. The dept column must be specified in the GROUP BY clause.