oracle高级分组指的是group by紧跟后面的函数如:
rollup与cube
group by的cube扩展
1. 先看一下ROLLUP的数据统计效果
1)创建测试表group_test
SECOOLER@ora11g> create table group_test (group_id int, job varchar2(10), name varchar2(10), salary int); Table created.
2)初始化数据
insert into group_test values (10,'Coding', 'Bruce',1000); insert into group_test values (10,'Programmer','Clair',1000); insert into group_test values (10,'Architect', 'Gideon',1000); insert into group_test values (10,'Director', 'Hill',1000); insert into group_test values (20,'Coding', 'Jason',2000); insert into group_test values (20,'Programmer','Joey',2000); insert into group_test values (20,'Architect', 'Martin',2000); insert into group_test values (20,'Director', 'Michael',2000); insert into group_test values (30,'Coding', 'Rebecca',3000); insert into group_test values (30,'Programmer','Rex',3000); insert into group_test values (30,'Architect', 'Richard',3000); insert into group_test values (30,'Director', 'Sabrina',3000); insert into group_test values (40,'Coding', 'Samuel',4000); insert into group_test values (40,'Programmer','Susy',4000); insert into group_test values (40,'Architect', 'Tina',4000); insert into group_test values (40,'Director', 'Wendy',4000); commit;
3)初始化之后的数据情况如下:
SECOOLER@ora11g> set pages 100 SECOOLER@ora11g> select * from group_test; GROUP_ID JOB NAME SALARY ---------- ---------- ---------- ---------- 10 Coding Bruce 1000 10 Programmer Clair 1000 10 Architect Gideon 1000 10 Director Hill 1000 20 Coding Jason 2000 20 Programmer Joey 2000 20 Architect Martin 2000 20 Director Michael 2000 30 Coding Rebecca 3000 30 Programmer Rex 3000 30 Architect Richard 3000 30 Director Sabrina 3000 40 Coding Samuel 4000 40 Programmer Susy 4000 40 Architect Tina 4000 40 Director Wendy 4000 16 rows selected.
4)ROLLUP的数据统计效果
sec@ora10g> select group_id,job,grouping(GROUP_ID),grouping(JOB),sum(salary) from group_test group by rollup(group_id, job); GROUP_ID JOB GROUPING(GROUP_ID) GROUPING(JOB) SUM(SALARY) ---------- ---------- ------------------ ------------- ----------- 10 Coding 0 0 1000 10 Director 0 0 1000 10 Architect 0 0 1000 10 Programmer 0 0 1000 10 0 1 4000 20 Coding 0 0 2000 20 Director 0 0 2000 20 Architect 0 0 2000 20 Programmer 0 0 2000 20 0 1 8000 30 Coding 0 0 3000 30 Director 0 0 3000 30 Architect 0 0 3000 30 Programmer 0 0 3000 30 0 1 12000 40 Coding 0 0 4000 40 Director 0 0 4000 40 Architect 0 0 4000 40 Programmer 0 0 4000 40 0 1 16000 1 1 40000 21 rows selected.
2. 理解CUBE
sec@ora10g> select group_id,job,grouping(GROUP_ID),grouping(JOB),sum(salary) from group_test group by cube(group_id, job) order by 1; GROUP_ID JOB GROUPING(GROUP_ID) GROUPING(JOB) SUM(SALARY) ---------- ---------- ------------------ ------------- ----------- 10 Architect 0 0 1000 10 Coding 0 0 1000 10 Director 0 0 1000 10 Programmer 0 0 1000 10 0 1 4000 20 Architect 0 0 2000 20 Coding 0 0 2000 20 Director 0 0 2000 20 Programmer 0 0 2000 20 0 1 8000 30 Architect 0 0 3000 30 Coding 0 0 3000 30 Director 0 0 3000 30 Programmer 0 0 3000 30 0 1 12000 40 Architect 0 0 4000 40 Coding 0 0 4000 40 Director 0 0 4000 40 Programmer 0 0 4000 40 0 1 16000 Architect 1 0 10000 Coding 1 0 10000 Director 1 0 10000 Programmer 1 0 10000 1 1 40000 25 rows selected.
解释如上结果中GROUPING函数返回值“0”和“1”的含义。
如果显示“1”表示CUBE函数对应的列(例如JOB字段)是由于CUBE函数所产生的空值对应的信息,即对此列进行汇总计算后的结果。
如果显示“0”表示此行对应的这列参未与ROLLUP函数分组汇总活动。
如果还是没有理解清楚,请参见Oracle官方文档中的描述内容:“Using a single column as its argument,GROUPING returns 1 when it encounters a NULL value created by a ROLLUP or CUBE operation. That is, if the NULL indicates the row is a subtotal,GROUPING returns a 1. Any other type of value, including a stored NULL, returns a 0.”
3. 仔细观察一下,CUBE与ROLLUP之间的细微差别
rollup(a,b) 统计列包含:(a,b)、(a)、()
rollup(a,b,c) 统计列包含:(a,b,c)、(a,b)、(a)、()
……以此类推ing……
cube(a,b) 统计列包含:(a,b)、(a)、(b)、()
cube(a,b,c) 统计列包含:(a,b,c)、(a,b)、(a,c)、(b,c)、(a)、(b)、(c)、()
……以此类推ing……
So,上面例子中CUBE的结果比ROLLUP多了下面关于第一列GROUP_ID的统计信息:
Architect 1 0 10000 Coding 1 0 10000 Director 1 0 10000
CUBE在ROLLUP的基础上进一步从各种维度上给出细化的统计汇总结果。
理解grouping()函数
GROUPING函数用来判断是否是合计列
通过grouping()函数排除空值
如何辨别数据中原有的空值和cube扩展所插入的空值?
在oracle8i引入grouping()函数,可以用来识别这些超聚合行。被用来作为grouping()函数参数的表达式必须与出现在group by子句中的表达式相匹配。
用grouping()来扩展报告
另一种使用grouping()的方法是放在having子句中,用来控制在输出中显示哪个层级的聚合。
如下所示:
group by prod_categary,cube(cust_income_level,age_range) having grouping(cust_income_level)=1;
使用grouping_id()来扩展报告
grouping_id()计算一个表达式,确定其参数中的哪一行(如果有的话)被用来生成超聚合行,然后创建一个为矢量,并将该值作为整型值返回。
SQL> SELECT department_id,job_id,SUM(salary),GROUPING(department_id) d_group,GROUPING(job_id) j_group,grouping_id(department_id,job_id) d_j_group_id 2 FROM employees 3 WHERE department_id IN (10,20,30) 4 GROUP BY CUBE(department_id,job_id) 5 ORDER BY grouping_id(department_id,job_id) ASC; DEPARTMENT_ID JOB_ID SUM(SALARY) D_GROUP J_GROUP D_J_GROUP_ID ------------- ---------- ----------- ---------- ---------- ------------ 30 PU_MAN 11000 0 0 0 20 MK_REP 6000 0 0 0 20 MK_MAN 13000 0 0 0 10 AD_ASST 4400 0 0 0 30 PU_CLERK 13900 0 0 0 10 4400 0 1 1 30 24900 0 1 1 20 19000 0 1 1 MK_REP 6000 1 0 2 MK_MAN 13000 1 0 2 PU_MAN 11000 1 0 2 AD_ASST 4400 1 0 2 PU_CLERK 13900 1 0 2 48300 1 1 3 14 rows selected.
其中D_J_GROUP_ID列的值是D_GROUP和J_GROUP两列的值的组合,如最后一行,D_GROUP和J_GROUP两列的值的组合即为11,转化为十进制即为3。
理解grouping sets 与rollup()
SQL> SELECT department_id,job_id,SUM(salary),GROUPING(department_id) d_group,GROUPING(job_id) j_group,grouping_id(department_id,job_id) d_j_group_id 2 FROM employees 3 WHERE department_id IN (10,20,30) 4 GROUP BY GROUPING SETS(department_id,job_id) 5 ORDER BY grouping_id(department_id,job_id) ASC; DEPARTMENT_ID JOB_ID SUM(SALARY) D_GROUP J_GROUP D_J_GROUP_ID ------------- ---------- ----------- ---------- ---------- ------------ 20 19000 0 1 1 30 24900 0 1 1 10 4400 0 1 1 MK_REP 6000 1 0 2 MK_MAN 13000 1 0 2 PU_MAN 11000 1 0 2 AD_ASST 4400 1 0 2 PU_CLERK 13900 1 0 2 8 rows selected.
另外grouping sets()括号里面可以嵌套rollup()或者cube函数,效果其实和单独使用rollup和cube函数一样。而grouping sets的结果其实是单独应用cube函数的结果的子集。
示例如下:
SQL> SELECT department_id,job_id,SUM(salary),GROUPING(department_id) d_group,GROUPING(job_id) j_group,grouping_id(department_id,job_id) d_j_group_id 2 FROM employees 3 WHERE department_id IN (10,20,30) 4 GROUP BY cube(department_id,job_id) 5 HAVING grouping_id(department_id,job_id) IN (1,2) 6 ORDER BY grouping_id(department_id,job_id) ASC; DEPARTMENT_ID JOB_ID SUM(SALARY) D_GROUP J_GROUP D_J_GROUP_ID ------------- ---------- ----------- ---------- ---------- ------------ 20 19000 0 1 1 30 24900 0 1 1 10 4400 0 1 1 MK_REP 6000 1 0 2 MK_MAN 13000 1 0 2 PU_MAN 11000 1 0 2 AD_ASST 4400 1 0 2 PU_CLERK 13900 1 0 2 8 rows selected.
理解group by的局限性
group by局限性如下:
- LOB列或者嵌套表或数组不能用做group by表达式的一部分。
- 标量子查询表达式是不允许的。
- 如果group by子句引用任何对象类型的列则查询不能并行化。