4. SQL Statements : SELECT : SELECT Statement Clauses : GROUP BY Clause
 
Share this page                  
GROUP BY Clause
The GROUP BY clause groups the selected rows based on identical values in a column or expression. This clause is typically used with aggregate functions to generate a single result row for each set of unique values in a set of columns or expressions.
A simple GROUP BY clause consists of a list of one or more columns or expressions that define the sets of rows that aggregations (like SUM, COUNT, MIN, MAX, and AVG) are to be performed on. A change in the value of any of the GROUP BY columns or expressions triggers a new set of rows to be aggregated.
If the GROUP BY clause contains CUBE or ROLLUP options, it creates superaggregate (subtotal) groupings in addition to the ordinary grouping.
GROUP BY has the following format:
GROUP BY [ALL | DISTINCT] grouping element {,grouping element}
where grouping element is:
column list
| ROLLUP (column list)
| CUBE (column list)
| GROUPING SETS (grouping element {,grouping element}
| ( )
and where:
ALL | DISTINCT
Retains (ALL) or eliminates (DISTINCT) duplicate values in the result set. Default: ALL
column list
Specifies one or more columns or expressions, each separated by a comma.
ROLLUP (column list)
Calculates group subtotals from right to left. Generates the simple GROUP BY aggregate rows, superaggregate rows, and a grand total row.
CUBE (column list)
Produces one row for each unique combination of expressions in the column list. Generates simple GROUP BY aggregate rows, superaggregate rows, cross-tabular rows, and a grand total row.
GROUPING SETS
Totals only the specified groups instead of the full set of aggregations generated by using CUBE or ROLLUP. GROUPING SETS syntax can be defined over simple column sets or CUBEs or ROLLUPs.
( )
Generates a total (that is, an aggregation computed over the entire set of input rows).
Simple GROUP BY Queries
The following query, which uses a simple GROUP BY clause, obtains the number of orders for each part number in the orders table:
SELECT partno, count(*) FROM orders
GROUP BY partno;
The preceding query returns one row for each part number in the orders table, even though there can be many orders for the same part number.
Nulls are used to represent unknown data, and two nulls are typically not considered to be equal in SQL comparisons. However, the GROUP BY clause treats nulls as equal and returns a single row for nulls in a grouped column or expression.
Grouping can be performed on multiple columns or expressions. For example, to display the number of orders for each part placed each day:
SELECT odate, partno, count(*) FROM orders
GROUP BY odate, partno;
If you specify the GROUP BY clause, columns referenced must be all the columns in the SELECT clause that do not contain an aggregate function. These columns can either be the column, an expression, or the ordinal number in the column list.
For example:
SELECT cust_no,
       CURRENT_DATE - odate AS days_since_order_placed,
       COUNT(*) AS number_of_orders
FROM orders
GROUP BY cust_no,
CURRENT_DATE - odate
ORDER BY 1, 2;
 
SELECT cust_no,
       CURRENT_DATE - odate AS days_since_order_placed,
       COUNT(*) AS number_of_orders
FROM orders
GROUP BY cust_no, 2
ORDER BY 1, 2;
ROLLUP, CUBE, and GROUPING SETS Queries
The GROUPING SETS extension to the GROUP BY clause includes:
ROLLUP and CUBE
GROUPING SETS expression
GROUPING function
These extensions reduce the complexity of your SQL while allowing efficient analysis across multiple dimensions.
ROLLUP performs aggregations at increasing levels up to a grand total. When multiple columns are specified, say GROUP BY ROLLUP(c1, c2, c3), ROLLUP generates the GROUP BY aggregate rows for each unique combination of values of (c1, c2, c3), plus superaggregate rows for each unique combination of values of (c1, c2), and (c1).
ROLLUP also generates a superaggregate row for the entire set of input rows.
The order of the columns specified in ROLLUP() can change the result and the number of rows in the result set.
List each employee's salary, the subtotal of all salaries in each department, and the total salary amount:
SELECT deptno, empno, SUM(sal) AS salary,
CASE GROUPING(deptno, empno) WHEN 0 THEN ' ' WHEN 1 THEN 'department total' WHEN 3 THEN 'grand total' END
FROM salary GROUP BY ROLLUP(deptno, empno);
 
deptno     empno     salary     col4
--------------------------------------------
(null)     (null)     46800     grand total
100        (null)      4400     department total
300        (null)      8800     department total
400        (null)      6500     department total
500        (null)     10400     department total
800        (null)     16700     department total
100          840       4400
300          499       1150
300          521       1400
300          654       1500
300          698        850
300          844       1150
300          900       2750
400          789       6500
500          299       3900
500          371       2200
500          473       2200
500          902       2100
800            5      10500
800          854       6200
CUBE generates the GROUP BY aggregate rows, plus superaggregate rows for each unique combination of expressions in the column list. So CUBE(c1, c2, c3) produces aggregate rows for each unique combination of (c1, c2, c3), as well as superaggregate rows for each unique combination of values of (c1, c2), (c1, c3), (c2, c3), (c1), (c2), and (c3), and a grand total row for the entire set of input rows. The order of the columns specified in CUBE() has no effect.
CUBE is useful for situations that require cross-tabular reports.
Show each job and salary by department, totals for each department, and totals for the entire company:
SELECT deptno,
       job,
       count(*),
       sum(sal)
  FROM emp
 GROUP BY CUBE(deptno,job);
 
   DEPTNO JOB         COUNT(*)   SUM(SAL)
--------- --------- --------- ---------
       10 CLERK              1       1300
       10 DIRECTOR           1       2450
       10 CEO                1       9000
       10                    3      12750
 
       20 CLERK              2       1300
       20 PROGRAMMER         2       5000
       20 DIRECTOR           1       2975
       20                    5       9275
 
       30 CLERK              1        950
       30 DIRECTOR           1       2850
       30 SALESMAN           4       5600
       30                    6       9400
 
          PROGRAMMER         2       5000
          CLERK              4       3550
          DIRECTOR           3       8275
          CEO                1       9000
          SALESMAN           4       5600
                            14      31425
The GROUPING SETS syntax lets you define multiple independent sets of grouping columns on which the aggregates are to be computed. You can specify GROUPING SETS when you do not need all the groupings that are generated by using ROLLUP or CUBE and when aggregations are required on distinct sets of grouping columns. ROLLUP and CUBE are simply special short forms for specific sets of GROUPING SETS.
Each grouping set defines a set of columns for which an aggregate result is computed. The final result set is the set of distinct rows from the individual grouping column specifications in the grouping sets. GROUPING SETS syntax can be defined over simple column sets or CUBEs or ROLLUPs. In effect, CUBE and ROLLUP are simply short forms for specific varieties of GROUPING SETS.
A GROUPING SETS query can be considered to be simply a UNION of each of the individual groupings defined in the GROUPING SETs syntax (or CUBE or ROLLUP).
Show sales totals for division and region:
SELECT division, region, SUM(sales) AS sales FROM div_sales
GROUP BY GROUPING SETS (division, region);
 
division     region     sales
-----------------------------
capital      (null)        58
energy       (null)       155
technology   (null)       206
home         (null)       109
(null)       us           174
(null)       europe       124
(null)       pacific       86
(null)       americas      80
(null)       mea           64
The empty grouping set—GROUP BY()—simply defines an aggregation to be computed over the entire source set of rows (a simple aggregate).
The GROUPING() function (see GROUPING) can be used to simplify a query that needs many GROUP BY levels. The function argument is a list of one or more columns or expressions in parentheses. Each parameter must appear in the GROUP BY clause. The result is an integer consisting of "n" binary digits, where "n" is the number of parameters to the function. For each result row of the grouped query, the digit corresponding to the nth parameter of the GROUPING function is 0 if the result row is based on a value of the nth parameter, else 1.
For example, for the following clause:
GROUP BY CUBE (c1, c2, c3)
GROUPING(c1, c2, c3) returns 0 for the <C1, C2, C3> rows, 4 for the <C2, C3> rows, 1 for the <C1, C2> rows, and so forth. Likewise, GROUPING(C3, C1) returns 3 for <C2> rows, 1 for <C3> rows, 0 for <C1, C3> rows and <C1, C2, C3> rows.