Analytical Functions

3. Elements of SQL Statements : SQL Functions : Analytical Functions

Share this page

Analytical functions compute an aggregate value based on a group of rows, and can return multiple rows for each group. Analytical functions can be used to calculate percentages or top-N results in a group.

Analytical functions can appear only in the select list of a query or in the ORDER BY clause. They cannot appear in WHERE, ON, HAVING, or GROUP BY clauses. Analytical functions can appear in the select lists of views and derived tables.

Analytical functions are window functions, and thus require an OVER clause.

Note: The ORDER BY clause within the OVER clause supports the NULLS FIRST / NULLS LAST syntax. The default is NULLS LAST.

DENSE_RANK

DENSE_RANK()

Returns the ordinal position of each result row within a partition, based on the sequence defined by the ordering definition for the window. Rows with the same values of their sort specification have the same RANK() value and result in no gaps in the list of ranks. For example: 1, 2, 2, 3, 4, 4, 4, 5.

The following query ranks each employee in a department based on his salary. When two employees have the same salary they are assigned the same rank. When multiple rows have the same rank, the next rank in the sequence is consecutive.

SELECT empno, deptno, sal,

DENSE_RANK() OVER (PARTITION BY deptno ORDER BY sal DESC) AS rank

FROM emp;

EMPNO DEPTNO SAL RANK

---------- ---------- ---------- ----------

839 100 4900 1

782 100 2350 2

934 100 1200 3

788 200 2900 1

902 200 2900 1

566 200 2875 2

876 200 1000 3

369 200 700 4

900 300 2750 1

654 300 1500 2

521 300 1400 3

844 300 1150 4

499 300 1150 4

698 300 850 5

The following query uses the DENSE_RANK function and then gets only the rows ranked 1 from each group. You must provide a correlation name (in this case, x) for the subquery:

SELECT fname, address FROM (SELECT fname, address,

DENSE_RANK() OVER (PARTITION BY fname ORDER BY timestamp DESC) AS rank

FROM tab1) x WHERE rank = 1;

fname address

---------- -------------‑---------

Mary 1600 Pennsylvania Av

FIRST_VALUE() OVER() and LAST_VALUE() OVER()

FIRST_VALUE (scalar_value_expression) [null_spec] OVER([partition_by_clause] order_by_clause [frame_spec])

LAST_VALUE (scalar_value_expression) [null_spec] OVER([partition_by_clause] order_by_clause frame_spec)

Produces one row per input row. The same value is returned for every row in the partition, or if there is no partition_by_clause, every row in the result.

where:

scalar_value_expr

Specifies any scalar expression that can appear in the select list of a query. Can contain constants, row values, operators, scalar functions, and scalar subqueries.

null_spec

Indicates whether to include null values in the selection of a first_value or last_value. Valid values are:

RESPECT NULLS

(Default) Uses the expression value in the first or last row.

IGNORE NULLS

Uses the first or last non-null expression value.

OVER ([partition_by_clause] order_by_clause frame_spec)

Defines the window as described in Window Function Syntax.

For FIRST_VALUE a frame specification is required if using IGNORE NULLS and must be ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

For LAST_VALUE a frame specification is required and must be ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

The following query ranks each employee in a department based on his salary:

SELECT empno, deptno, sal,

FIRST_VALUE(empno) OVER (PARTITION BY deptno ORDER BY sal DESC)

FROM emp;

EMPNO DEPTNO SAL FIRST_VALUE

---------- ---------- ---------- ----------

839 100 4900 839

782 100 2350 839

934 100 1200 839

788 200 2900 788

902 200 2900 788

566 200 2875 788

876 200 1000 788

369 200 700 788

900 300 2750 900

654 300 1500 900

521 300 1400 900

844 300 1150 900

499 300 1150 900

698 300 850 900

LAG and LEAD

LAG | LEAD (scalar_expression [,offset] [,default]) [RESPECT NULLS | IGNORE NULLS]

Returns a scalar column value from a row before (LAG) or after (LEAD) the current row. Use LAG to compare values in the current row with values in a previous row. Use LEAD to compare values in the current row with values in a following row.

LAG and LEAD are useful for comparing values across multiple rows—for example, across time periods.

where:

scalar_expression

Is an expression that returns a single value. The expression can be a constant, row value, operator, scalar function, and scalar subquery.

offset

Specifies the number of rows of lag or lead from the current row. Must be an expression that resolves to a positive integer. Default is 1.

default

Returns the specified value if no lagging or leading row is present (for example, lag of the first value in an ordered set). Default is NULL.

RESPECT NULLS | IGNORE NULLS

Indicates whether to include null values in the selection of a lag or lead value. Default is RESPECT NULLS.

IGNORE NULLS and offset N returns the N-th previous/following not-NULL value.

RESPECT NULLS and offset N returns the N-th previous/following value, counting NULL values.

The following query shows the date, the weather, and the previous day's weather:

SELECT wdate, forecast, LAG(forecast) OVER(ORDER BY wdate) AS lag

FROM weather ORDER BY wdate;

wdate forecast lag

---------------------------------

2012-06-12 overcast NULL

2012-06-13 sunny overcast

2012-06-14 rain sunny

The following query finds the salary of each employee in department 300 and lists the next higher salary:

SELECT deptno, empno, sal,

LEAD(sal) OVER(ORDER BY sal) AS next_sal

FROM salary WHERE deptno=300;

deptno empno sal next_sal

----------------------------------

300 698 1150 1150

300 844 1150 1450

300 499 1450 1500

300 521 1500 2750

300 654 2750

300 900

The following example demonstrates the use of IGNORE NULLS:

CREATE TABLE lead_table(c1 INT , c2 INT , c3 INT);

INSERT INTO lead_table VALUES (1,10,1), (2, 20,2), (NULL, 30,1), (NULL,40,2), (5,50,1);

SELECT

c1, c2, c3,

LEAD(c1 , 1,100) OVER (ORDER BY c2) AS lead1,

LEAD(c1 IGNORE NULLS, 1,200) OVER (ORDER BY c2) AS lead1ign,

LAG (c1 , 2,300) OVER (ORDER BY c2) AS lag2,

LAG (c1 IGNORE NULLS, 2,400) OVER (ORDER BY c2) AS lag2ign

FROM lead_table;

c1 c2 c3 lead1 lead1ign lag2 lag2ign

------ ---- -- ----- -------- ----- --------

1 10 1 2 2 300 400

2 20 2 NULL 5 300 400

NULL 30 1 NULL 5 1 1

NULL 40 2 5 5 2 1

5 50 1 100 200 NULL 1

Note the difference in results between the lag2 and lag2ign columns.

For the fourth row, for RESPECT NULLS the value two rows above is 2.

However, for IGNORE NULLS the second non-null value above is 1.

SELECT

c1, c2, c3,

LEAD(c1 , 1,100) OVER (PARTITION BY c3 ORDER BY c2) AS lead1,

LEAD(c1 IGNORE NULLS, 1,200) OVER (PARTITION BY c3 ORDER BY c2) AS lead1ign,

LAG (c1 , 1,300) OVER (PARTITION BY c3 ORDER BY c2) AS lag1,

LAG (c1 IGNORE NULLS, 1,400) OVER (PARTITION BY c3 ORDER BY c2) AS lag1ign

FROM lead_table;

c1 c2 c3 lead1 lead1ign lag1 lag1ign

---- --- --- ----- --------- ---- -------

1 10 1 NULL 5 300 400

NULL 30 1 5 5 1 1

5 50 1 100 200 NULL 1

2 20 2 NULL 200 300 400

NULL 40 2 100 200 2 2

NTILE

NTILE(n)

Divides the rows in an ordered partition into n groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs.

NTILE can be used, for example, to see what quartile, decile, or percentile a row is in.

The following query divides the employees in Department 400 into 4 groups by salary:

SELECT Department_ID, Employee_ID, Salary, NTILE(4)

OVER (PARTITION BY Department_ID ORDER BY Salary DESC) AS Quartile

FROM employees WHERE Department_ID = 400;

Department_ID Employee_ID Salary Quartile

------------- -------------- --------- -----------

400 50 9000 1

400 48 8000 1

400 51 7500 1

400 54 7000 1

400 53 6500 2

400 47 6000 2

400 44 5000 2

400 46 4500 2

400 52 4300 3

400 45 4000 3

400 43 3500 3

400 42 3000 4

400 41 3000 4

400 49 2800 4

When the number of rows is not divisible by n, the later rows will have the smaller number of rows, so the first and second quartiles have 4 rows whereas the third and fourth have only 3.

PERCENT_RANK

PERCENT_RANK()

Calculates the relative rank of a row within a group of rows. PERCENT_RANK returns a number between 0 and 1, which represents the percentage of rows in the group that are less than the current row. If a partition has exactly one row, its percent_rank() is 0. Percent_rank() for the highest value in the group will always be 1.

Use this function to determine the relative standing of a value within a result set.

The following query calculates, for each employee, the percent rank of the employee's salary in the department:

SELECT deptno, empno, sal,

PERCENT_RANK() OVER (PARTITION BY deptno ORDER BY sal DESC) AS pr

FROM emp

ORDER BY deptno, pr;

DEPTNO EMPNO SAL PR

------------- ------------- ------------ ----------

100 840 4400 0.000

300 900 2750 0.000

300 654 1500 0.200

300 521 1400 0.400

300 844 1150 0.600

300 499 1150 0.600

300 698 850 1.000

400 789 6500 0.000

500 299 3900 0.000

500 473 2200 0.333

500 371 2200 0.333

500 900 2100 1.000

800 5 10500 0.000

800 854 6200 1.000

RANK

RANK()

Returns the ordinal position of each result row within a partition, based on the sequence defined by the ordering definition for the window. Rows with the same values of their sort specification have the same RANK() value and result in gaps in the list of ranks. For example: 1, 2, 2, 4, 5, 5, 5, 8.

SELECT empno, deptno, sal,

RANK() OVER (PARTITION BY deptno ORDER BY sal DESC) AS rank

FROM emp;

EMPNO DEPTNO SAL RANK

---------- ---------- ---------- ----------

839 100 4900 1

782 100 2350 2

934 100 1200 3

788 200 2900 1

902 200 2900 1

566 200 2875 3

876 200 1000 4

369 200 700 5

698 300 2750 1

499 300 1500 2

844 300 1400 3

521 300 1150 4

654 300 1150 4

900 300 850 6

ROW_NUMBER

ROW_NUMBER()

Returns the ordinal position of each result row within a partition, based on the sequence defined by the ordering definition for the window. Rows with the same values in their sort specification are ordered arbitrarily. For example: 1, 2, 3, 4, 5.

The following query assigns a consecutive number to each row. Rows with matching numbers are ordered arbitrarily.

SELECT empno, deptno, sal,

ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY sal) AS rownum

FROM emp;

EMPNO DEPTNO SAL ROWNUM

---------- ---------- ---------- ----------

934 100 1200 1

782 100 2350 2

839 100 4900 3

369 200 700 1

876 200 1000 2

566 200 2875 3

788 200 2900 4

902 200 2900 5

900 300 850 1

654 300 1150 2

521 300 1150 3

844 300 1400 4

499 300 1500 5

698 300 2750 6