Aggregate Functions

3. Elements of SQL Statements : SQL Functions : Aggregate Functions

Share this page

Aggregate Functions

Aggregate functions return a single value, calculated from values in a column. They take a set of values as their argument. Aggregate functions are also called set functions.

Basic Aggregate Functions

A basic aggregate function returns a single value by aggregating multiple rows of data.

The syntax of a basic aggregate function is as follows:

function_name ([DISTINCT | ALL] expr)

where:

function_name denotes an aggregate function.

expr denotes any expression that does not include an aggregate function reference (at any level of nesting).

DISTINCT eliminates duplicate values of expr within a set of input rows to the aggregation. ALL (the default) retains duplicate values of expr.

Null values of expr are ignored by the aggregate functions, with the exception of COUNT(*).

The following example uses the SUM aggregate function to calculate the total of salaries for employees in department 23:

SELECT SUM (employee.salary)
FROM employee
WHERE employee.dept = 23;

ANY

Result type: INTEGER

Returns 1 if any row in the table fulfills the where clause, or 0 if no rows fulfill the WHERE clause.

AVG

Result type: FLOAT, MONEY, DATE (INTERVAL only)

Returns the average (sum / count). The sum of the values must be within the range of the result data type.

COUNT

Result type: INTEGER8

Returns count of non-null occurrences.

For example, count the number of employees allocated to a department:

SELECT COUNT(dept)

FROM employee;

Note: Returns INTEGER4 when used against Ingres tables.

COUNT_BIG is a synonym for COUNT.

COUNT(*)

Result type: INTEGER8

Returns count of the number of rows in a result table, including rows that contain nulls.

For example, the following statement counts the number of employees in department 'PERS':

SELECT COUNT(*)
FROM employee
WHERE dept = 'PERS';

The asterisk (*) argument cannot be qualified with ALL or DISTINCT.

Because COUNT(*) counts rows rather than columns, it does not ignore nulls. Consider the following table:

NAME EXEMPTIONS

Smith 0

Jones 2

Tanghetti 4

Fong Null

Stevens Null

The expression:

COUNT(exemptions)

returns the value of 3, whereas:

COUNT(*)

returns 5.

If the argument to the COUNT function evaluates to an empty set, COUNT returns zero, which is in contrast to other functions, which return a null.

GROUPING

Result type: INTEGER

Returns an integer value based on its parameters.

It can be used to simplify a query that needs many GROUP BY levels by letting you more easily express row filtering conditions.

The GROUPING function can have multiple parameters, each of which must be a column or expression in the GROUP BY clause of an aggregate query. The parameters from right to left are represented in the function result by bits from right to left.

When CUBE, ROLLUP, or GROUPING SETS syntax is used, result rows from the query can be formed based on the grouping of rows according to different combinations of the grouping columns or expressions. If a result row is based on a grouping that includes a parameter of the grouping function, its bit position in the GROUPING() function result is 0, otherwise 1. This helps you determine which groupings each of the result rows comes from.

For example:

SELECT sno, pno, SUM(qty), GROUPING(sno, pno) FROM sp

GROUP BY CUBE(sno, pno)

produces results like:

s1 p1 25 0

s1 p2 30 0

...

s1 100 1

s2 50 1

...

p1 95 2

p2 10 2

...

1000 3

The numbers 0, 1, 2, and 3 are the results of the GROUPING() function. Rows with both sno and pno have a GROUPING() value of 0 because they result from the grouping of both sno and pno. Rows with only sno (superaggregate rows grouped only on sno) have a value of 1 because the rightmost parameter (pno) does not contribute to the grouping. Rows with only pno (superaggregates grouped only on pno) have a value of 2 because the next parameter (sno—in bit position 2) does not contribute. And the last row has a value of 3 because neither sno nor pno are involved in the grouping.

MAX

Result type: Same as argument

Returns maximum value.

MIN

Result type: Same as argument

Returns minimum value.

STDDEV_POP

Result type: FLOAT

Computes the population form of the standard deviation (square root of the population variance of the group).

STDDEV_SAMP

Result type: FLOAT

Computes the sample form of the standard deviation (square root of the sample variance of the group).

SUM

Result type: INTEGER, FLOAT, MONEY, DATE (INTERVAL only)

Returns column total.

VAR_POP

Result type: FLOAT

Computes the population form of the variance (sum of the squares of the difference of each argument value in the group from the mean of the values, divided by the count of the values).

VAR_SAMP

Result type: FLOAT

Computes the sample form of the variance (sum of the squares of the difference of each argument value in the group from the mean of the values, divided by the count of the values minus 1).

Regression and Correlation Analysis Aggregate Functions

The following aggregate functions perform a variety of regression and correlation analysis.

Syntax is as follows:

function_name(indep_parm, dep_parm)

where function_name denotes the function name and the first argument is the independent variable and the second argument is the dependent variable.

Functions for regression and correlation analysis are as follows:

CORR

CORR(indep_parm, dep_parm)

Result type: FLOAT

Returns correlation coefficient (ratio of the population covariance divided by the product of the population standard deviation of the independent variable and the population standard deviation of the dependent variable).

COVAR_POP

COVAR_POP(indep_parm, dep_parm)

Result type: FLOAT

Returns population covariance (sum of the products of the difference of the independent variable from its mean, times the difference of the dependent variable from its mean, divided by the number of rows).

COVAR_SAMP

COVAR_SAMP(indep_parm, dep_parm)

Result type: FLOAT

Returns sample covariance (sum of the products of the difference of the independent variable from its mean, times the difference of the dependent variable from its mean, divided by the number of rows minus 1).

REGR_AVGX

REGR_AVGX(indep_parm, dep_parm)

Result type: FLOAT

Returns average of the independent variables.

REGR_AVGY

REGR_AVGY(indep_parm, dep_parm)

Result type: FLOAT

Returns average of the dependent variables.

REGR_COUNT

REGR_COUNT(indep_parm, dep_parm)

Result type: INTEGER

Returns count of rows with non-null values for both dependent and independent variables.

REGR_INTERCEPT

REGR_INTERCEPT(indep_parm, dep_parm)

Result type: FLOAT

Returns Y-intercept of the least-squares-fit linear equation determined by the (independent variable, dependent variable) pairs.

REGR_R2

REGR_R2(indep_parm, dep_parm)

Result type: FLOAT

Returns square of the correlation coefficient.

REGR_SLOPE

REGR_SLOPE(indep_parm, dep_parm)

Result type: FLOAT

Returns slope of the least-squares-fit linear equation determined by the (independent variable, dependent variable) pairs.

REGR_SXX

REGR_SXX(indep_parm, dep_parm)

Result type: FLOAT

Returns the sample corrected sum of the squares of the independent variable.

REGR_SXY

REGR_SXY(indep_parm, dep_parm)

Result type: FLOAT

Returns sum of the product of the independent variable and the dependent variable.

REGR_SYY

REGR_SYY(indep_parm, dep_parm)

Result type: FLOAT

Returns the sample corrected sum of the squares of the dependent variable.

String Aggregate Functions

String aggregate functions concatenate a set of string values.

LISTAGG

LISTAGG ([DISTINCT] value_expr [, 'delimiter'])
[WITHIN GROUP (order-by-clause)]

Concatenates expressions within a group.

Result type: VARCHAR(4000). Longer strings are truncated. The length of the result can be configured using the [engine] listagg_group_maxlen parameter.

where:

DISTINCT

Eliminates duplicate values.

value_expr

Specifies an expression that can appear in the select list of a query. Can contain constants, row values, operators, scalar functions, and scalar subqueries. Null values in the value_expr column are ignored. The value is cast to char data type before being concatenated.

'delimiter'

Defines the separator between concatenated items. The default separator is the empty string.

WITHIN GROUP (order-by-clause)

Specifies how the items in the result should be sorted.

The following LISTAGG examples are based on this table:

DEPT EMPNO FIRSTNAME LASTNAME

----------- ---------- ---------- ----------

Marketing 101 Douglas Cray

Marketing 103 Dong Luang

Marketing 105 Dennis Indolay

Admin 107 Sherry Keller

Admin 109 Carl Nader

Admin 111 Ruth Turret

Sales 113 Andrew Bonnet

Sales 115 Oscar Hender

Sales 117 Justin Braushere

Sales 119 Smith

Concatenate the names of all employees, ordered by last name (LISTAGG used as a simple aggregate):

SELECT LISTAGG(lastname)
WITHIN GROUP (ORDER BY lastname) AS PERSONNEL
FROM employee;

PERSONNEL

-----------------------------------------------------------

BonnetBraushereCrayHenderIndolayKellerLuangNaderSmithTurret

Same as previous query, but use a delimiter to separate names with a comma:

SELECT LISTAGG(lastname, ',')
WITHIN GROUP (ORDER BY lastname) AS PERSONNEL
FROM employee;

PERSONNEL

--------------------------------------------------------------------

Bonnet,Braushere,Cray,Hender,Indolay,Keller,Luang,Nader,Smith,Turret

Concatenate employees in each department, ordered by last name (LISTAGG used as a regular aggregate with GROUP BY):

SELECT dept, LISTAGG(lastname, ',')
WITHIN GROUP (ORDER BY lastname) AS PERSONNEL
FROM employee GROUP BY dept;

DEPT PERSONNEL

----------- ------------------------------

Admin Keller,Nader,Turret

Marketing Cray,Indolay,Luang

Sales Bonnet,Braushere,Hender,Smith

Concatenate employee names in the same department, ordered by last name, partitioned by department (LISTAGG used as a windowing aggregate):

SELECT dept, firstname, lastname, LISTAGG(lastname, ',')
    WITHIN GROUP(ORDER BY lastname)
    OVER (PARTITION BY dept) AS PERSONNEL
    FROM employee ORDER BY dept, lastname, firstname;

DEPT FIRSTNAME LASTNAME PERSONNEL IN SAME DEPT

----------- ---------- ---------- ------------------------------

Admin Sherry Keller Keller,Nader,Turret

Admin Carl Nader Keller,Nader,Turret

Admin Ruth Turret Keller,Nader,Turret

Marketing Douglas Cray Cray,Indolay,Luang

Marketing Dennis Indolay Cray,Indolay,Luang

Marketing Dong Luang Cray,Indolay,Luang

Sales Andrew Bonnet Bonnet,Braushere,Hender,Smith

Sales Justin Braushere Bonnet,Braushere,Hender,Smith

Sales Oscar Hender Bonnet,Braushere,Hender,Smith

Sales Smith Bonnet,Braushere,Hender,Smith

Ordering Aggregate Functions

Ordering aggregate functions order the aggregated rows.

FIRST_VALUE WITHIN GROUP and LAST_VALUE WITHIN GROUP

FIRST_VALUE (scalar_value_expression) [null_specification] WITHIN GROUP (order_by_clause)...[group_by_clause]

LAST_VALUE (scalar_value_expression) [null_specification] WITHIN GROUP (order_by_clause)...[group_by_clause]

Produces one row per group, or if there is no group_by_clause, a single row

where:

scalar_value_expr

Specifies any scalar expression that can appear in the select list of a query. Can contain constants, row values, operators, scalar functions, and scalar subqueries.

null specification

Indicates whether to include null values in the selection of a first_value or last_value. Valid values are:

RESPECT NULLS

(Default) Uses the expression value in the first or last row.

IGNORE NULLS

Uses the first or last non-null expression value.

The following query shows the employee number (empno) of the highest paid employee in each department:

SELECT deptno, FIRST_VALUE(empno)

WITHIN GROUP (ORDER BY sal DESC) AS "FIRST VALUE"

FROM emp GROUP BY deptno;

DEPTNO FIRST VALUE

---------- -----------

100 839

200 788

300 900

Aggregate Functions and Decimal Data

Given decimal arguments, aggregate functions (with the exception of COUNT) return decimal results.

The following table explains how to determine the scale and precision of results returned for aggregates with decimal arguments:

Function Name	Precision of Result	Scale of Result
COUNT	Not applicable	Not applicable
SUM	38	Same as argument
AVG	38	Scale of argument + 1 (to a maximum of 38)
MAX	Same as argument	Same as argument
MIN	Same as argument	Same as argument

GROUP BY Clause with Aggregate Functions

The GROUP BY clause allows aggregate functions to be performed on subsets of the rows in the table. The subsets are defined by the GROUP BY clause.

For example, the following statement selects rows from a table of political candidates, groups the rows by party, and returns the name of each party and the average funding for the candidates in that party.

SELECT party, AVG(funding)
FROM candidates
GROUP BY party;

Restrictions on the Use of Aggregate Functions

The following restrictions apply to the use of aggregate functions:

• Aggregate functions cannot be nested.

• Aggregate functions can be used only in SELECT or HAVING clauses.

• If a SELECT or HAVING clause contains an aggregate function, columns not specified in the aggregate must be specified in the GROUP BY clause. For example:

SELECT dept, AVG(emp_age)
FROM employee
GROUP BY dept;

The above SELECT statement specifies two columns, dept and emp_age, but only emp_age is referenced by the aggregate function, AVG. The dept column must be specified in the GROUP BY clause.