视频1 视频21 视频41 视频61 视频文章1 视频文章21 视频文章41 视频文章61 推荐1 推荐3 推荐5 推荐7 推荐9 推荐11 推荐13 推荐15 推荐17 推荐19 推荐21 推荐23 推荐25 推荐27 推荐29 推荐31 推荐33 推荐35 推荐37 推荐39 推荐41 推荐43 推荐45 推荐47 推荐49 关键词1 关键词101 关键词201 关键词301 关键词401 关键词501 关键词601 关键词701 关键词801 关键词901 关键词1001 关键词1101 关键词1201 关键词1301 关键词1401 关键词1501 关键词1601 关键词1701 关键词1801 关键词1901 视频扩展1 视频扩展6 视频扩展11 视频扩展16 文章1 文章201 文章401 文章601 文章801 文章1001 资讯1 资讯501 资讯1001 资讯1501 标签1 标签501 标签1001 关键词1 关键词501 关键词1001 关键词1501 专题2001
PostgreSQL的window函数整理
2020-11-09 09:41:33 责编:小采
文档


简言之,聚合函数返回的各个分组的结果,窗口函数则为每一行返回结果,示例如下: 1.建示例表,初始化数据 DROP TABLE IF EXISTS empsalary; CREATE TABLE empsalary( depname varchar, empno bigint, salary int, enroll_date date ); INSERT INTO empsalary V

简言之,聚合函数返回的各个分组的结果,窗口函数则为每一行返回结果,示例如下:
1.建示例表,初始化数据

DROP TABLE IF EXISTS empsalary;
CREATE TABLE empsalary(
depname varchar,
empno bigint,
salary int,
enroll_date date
);
INSERT INTO empsalary VALUES('develop',10, 5200, '2007/08/01');
INSERT INTO empsalary VALUES('sales', 1, 5000, '2006/10/01');
INSERT INTO empsalary VALUES('personnel', 5, 3500, '2007/12/10');
INSERT INTO empsalary VALUES('sales', 4, 4800, '2007/08/08');
INSERT INTO empsalary VALUES('sales', 6, 5500, '2007/01/02');
INSERT INTO empsalary VALUES('personnel', 2, 3900, '2006/12/23');
INSERT INTO empsalary VALUES('develop', 7, 4200, '2008/01/01');
INSERT INTO empsalary VALUES('develop', 9, 4500, '2008/01/01');
INSERT INTO empsalary VALUES('sales', 3, 4800, '2007/08/01');
INSERT INTO empsalary VALUES('develop', 8, 6000, '2006/10/01');
INSERT INTO empsalary VALUES('develop', 11, 5200, '2007/08/15');

postgres=# select * from empsalary ;
depname | empno | salary | enroll_date
-----------+-------+--------+-------------
develop | 10 | 5200 | 2007-08-01
sales | 1 | 5000 | 2006-10-01
personnel | 5 | 3500 | 2007-12-10
sales | 4 | 4800 | 2007-08-08
sales | 6 | 5500 | 2007-01-02
personnel | 2 | 3900 | 2006-12-23
develop | 7 | 4200 | 2008-01-01
develop | 9 | 4500 | 2008-01-01
sales | 3 | 4800 | 2007-08-01
develop | 8 | 6000 | 2006-10-01
develop | 11 | 5200 | 2007-08-15
(11 rows)2.统计示例
a.统计各部门的总薪水,平均薪水和部门的详细情况
postgres=# select sum(salary) OVER (PARTITION BY depname),avg(salary) OVER (PARTITION BY depname),* from empsalary;
sum | avg | depname | empno | salary | enroll_date
-------+-----------------------+-----------+-------+--------+-------------
25100 | 5020.0000000000000000 | develop | 10 | 5200 | 2007-08-01
25100 | 5020.0000000000000000 | develop | 7 | 4200 | 2008-01-01
25100 | 5020.0000000000000000 | develop | 9 | 4500 | 2008-01-01
25100 | 5020.0000000000000000 | develop | 8 | 6000 | 2006-10-01
25100 | 5020.0000000000000000 | develop | 11 | 5200 | 2007-08-15
7400 | 3700.0000000000000000 | personnel | 2 | 3900 | 2006-12-23
7400 | 3700.0000000000000000 | personnel | 5 | 3500 | 2007-12-10
20100 | 5025.0000000000000000 | sales | 3 | 4800 | 2007-08-01
20100 | 5025.0000000000000000 | sales | 1 | 5000 | 2006-10-01
20100 | 5025.0000000000000000 | sales | 4 | 4800 | 2007-08-08
20100 | 5025.0000000000000000 | sales | 6 | 5500 | 2007-01-02
(11 rows)b.统计人员在所在部门的薪水排名情况
postgres=# select rank() OVER (PARTITION BY depname ORDER BY salary),* from empsalary;
rank | depname | empno | salary | enroll_date
------+-----------+-------+--------+-------------
1 | develop | 7 | 4200 | 2008-01-01
2 | develop | 9 | 4500 | 2008-01-01
3 | develop | 10 | 5200 | 2007-08-01
3 | develop | 11 | 5200 | 2007-08-15
5 | develop | 8 | 6000 | 2006-10-01
1 | personnel | 5 | 3500 | 2007-12-10
2 | personnel | 2 | 3900 | 2006-12-23
1 | sales | 4 | 4800 | 2007-08-08
1 | sales | 3 | 4800 | 2007-08-01
3 | sales | 1 | 5000 | 2006-10-01
4 | sales | 6 | 5500 | 2007-01-02
(11 rows)3.一个有趣的例子 注意使用order by,结果会两样
create table foo(a int,b int) ;
insert into foo values (1,1);
insert into foo values (1,1);
insert into foo values (2,1);
insert into foo values (4,1);
insert into foo values (2,1);
insert into foo values (4,1);
insert into foo values (5,1);
insert into foo values (11,3);
insert into foo values (12,3);
insert into foo values (22,3);
insert into foo values (16,3);
insert into foo values (16,3);
insert into foo values (16,3);

postgres=# select sum(a) over (partition by b), a, b from foo;
sum | a | b
-----+----+---
19 | 1 | 1
19 | 1 | 1
19 | 2 | 1
19 | 4 | 1
19 | 2 | 1
19 | 4 | 1
19 | 5 | 1
93 | 11 | 3
93 | 12 | 3
93 | 22 | 3
93 | 16 | 3
93 | 16 | 3
93 | 16 | 3
(13 rows)

postgres=# select sum(a) over (partition by b order by a), a, b from foo;
sum | a | b
-----+----+---
2 | 1 | 1
2 | 1 | 1
6 | 2 | 1
6 | 2 | 1
14 | 4 | 1
14 | 4 | 1
19 | 5 | 1
11 | 11 | 3
23 | 12 | 3
71 | 16 | 3
71 | 16 | 3
71 | 16 | 3
93 | 22 | 3
(13 rows)

postgres=# select a, b, sum(a) over (partition by b order by a ROWS
postgres(# BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) from foo;
a | b | sum
----+---+-----
1 | 1 | 19
1 | 1 | 19
2 | 1 | 19
2 | 1 | 19
4 | 1 | 19
4 | 1 | 19
5 | 1 | 19
11 | 3 | 93
12 | 3 | 93
16 | 3 | 93
16 | 3 | 93
16 | 3 | 93
22 | 3 | 93
(13 rows)官网中的解释是: By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause. When ORDER BY is omitted the default frame consists of all rows in the partition.
默认情况下,带了order by 参数会从分组的起始值开始一直叠加,直到当前值为止,当忽略order by 参数则会计算分组中所有值的和。

4.其他的窗口函数
row_number(): 从当前开始,不间断,如1,2,3,4,5,6
rank() :从当前开始,会间断,如1,2,2,4,5,6
dense_rank():从当前开始不间断,但会重复,如1,2,2,3,4,5
percent_rank():从当前开始,计算在分组中的比例,如0,0.25,0.25,0.75,1,0,1 从0-1不断地循环
cume_dist():当前行的排序除以分组的数量,如分组有4行,则值为0.25,0.5,0.75,1
ntile(num_buckets integer):从1到当前值,除以分组的的数量,尽可能使分布平均
lag(value any [, offset integer [, default any ]]):偏移量函数,取滞后值,如lag(column_name,2,0)表示字段偏移量为2,没有则用default值代替,这里是0,不写默认是null
lead(value any [, offset integer [, default any ]]):偏移量函数,取提前值,类上 first_value(value any):返回窗口框架中的第一个值
last_value(value any):返回窗口框架中的最后一个值
nth_value(value any, nth integer):返回窗口框架中的指定值,如nth_value(salary,2),则表示返回字段salary的第二个窗口函数值

5.其他窗口函数示例
postgres=# select row_number() over (partition by depname order by salary desc),* from empsalary;
row_number | depname | empno | salary | enroll_date
------------+-----------+-------+--------+-------------
1 | develop | 8 | 6000 | 2006-10-01
2 | develop | 10 | 5200 | 2007-08-01
3 | develop | 11 | 5200 | 2007-08-15
4 | develop | 9 | 4500 | 2008-01-01
5 | develop | 7 | 4200 | 2008-01-01
1 | personnel | 2 | 3900 | 2006-12-23
2 | personnel | 5 | 3500 | 2007-12-10
1 | sales | 6 | 5500 | 2007-01-02
2 | sales | 1 | 5000 | 2006-10-01
3 | sales | 3 | 4800 | 2007-08-01
4 | sales | 4 | 4800 | 2007-08-08
(11 rows)

postgres=# select rank() over(partition by depname order by salary desc),* from empsalary;
rank | depname | empno | salary | enroll_date
------+-----------+-------+--------+-------------
1 | develop | 8 | 6000 | 2006-10-01
2 | develop | 10 | 5200 | 2007-08-01
2 | develop | 11 | 5200 | 2007-08-15
4 | develop | 9 | 4500 | 2008-01-01
5 | develop | 7 | 4200 | 2008-01-01
1 | personnel | 2 | 3900 | 2006-12-23
2 | personnel | 5 | 3500 | 2007-12-10
1 | sales | 6 | 5500 | 2007-01-02
2 | sales | 1 | 5000 | 2006-10-01
3 | sales | 3 | 4800 | 2007-08-01
3 | sales | 4 | 4800 | 2007-08-08
(11 rows)


postgres=# select dense_rank() over(partition by depname order by salary desc),* from empsalary;
dense_rank | depname | empno | salary | enroll_date
------------+-----------+-------+--------+-------------
1 | develop | 8 | 6000 | 2006-10-01
2 | develop | 10 | 5200 | 2007-08-01
2 | develop | 11 | 5200 | 2007-08-15
3 | develop | 9 | 4500 | 2008-01-01
4 | develop | 7 | 4200 | 2008-01-01
1 | personnel | 2 | 3900 | 2006-12-23
2 | personnel | 5 | 3500 | 2007-12-10
1 | sales | 6 | 5500 | 2007-01-02
2 | sales | 1 | 5000 | 2006-10-01
3 | sales | 3 | 4800 | 2007-08-01
3 | sales | 4 | 4800 | 2007-08-08
(11 rows)

postgres=# select percent_rank() over(partition by depname order by salary desc),* from empsalary;
percent_rank | depname | empno | salary | enroll_date
-------------------+-----------+-------+--------+-------------
0 | develop | 8 | 6000 | 2006-10-01
0.25 | develop | 10 | 5200 | 2007-08-01
0.25 | develop | 11 | 5200 | 2007-08-15
0.75 | develop | 9 | 4500 | 2008-01-01
1 | develop | 7 | 4200 | 2008-01-01
0 | personnel | 2 | 3900 | 2006-12-23
1 | personnel | 5 | 3500 | 2007-12-10
0 | sales | 6 | 5500 | 2007-01-02
0.333333333333333 | sales | 1 | 5000 | 2006-10-01
0.666666666666667 | sales | 3 | 4800 | 2007-08-01
0.666666666666667 | sales | 4 | 4800 | 2007-08-08
(11 rows)

postgres=# select cume_dist()over(partition by depname order by salary desc),* from empsalary;
cume_dist | depname | empno | salary | enroll_date
-----------+-----------+-------+--------+-------------
0.2 | develop | 8 | 6000 | 2006-10-01
0.6 | develop | 10 | 5200 | 2007-08-01
0.6 | develop | 11 | 5200 | 2007-08-15
0.8 | develop | 9 | 4500 | 2008-01-01
1 | develop | 7 | 4200 | 2008-01-01
0.5 | personnel | 2 | 3900 | 2006-12-23
1 | personnel | 5 | 3500 | 2007-12-10
0.25 | sales | 6 | 5500 | 2007-01-02
0.5 | sales | 1 | 5000 | 2006-10-01
1 | sales | 3 | 4800 | 2007-08-01
1 | sales | 4 | 4800 | 2007-08-08
(11 rows)

postgres=# select ntile(3)over(partition by depname order by salary desc),* from empsalary;
ntile | depname | empno | salary | enroll_date
-------+-----------+-------+--------+-------------
1 | develop | 8 | 6000 | 2006-10-01
1 | develop | 10 | 5200 | 2007-08-01
2 | develop | 11 | 5200 | 2007-08-15
2 | develop | 9 | 4500 | 2008-01-01
3 | develop | 7 | 4200 | 2008-01-01
1 | personnel | 2 | 3900 | 2006-12-23
2 | personnel | 5 | 3500 | 2007-12-10
1 | sales | 6 | 5500 | 2007-01-02
1 | sales | 1 | 5000 | 2006-10-01
2 | sales | 3 | 4800 | 2007-08-01
3 | sales | 4 | 4800 | 2007-08-08
(11 rows)

postgres=# select lag(salary,2,null)over(partition by depname order by salary desc),* from empsalary;
lag | depname | empno | salary | enroll_date
------+-----------+-------+--------+-------------
| develop | 8 | 6000 | 2006-10-01
| develop | 10 | 5200 | 2007-08-01
6000 | develop | 11 | 5200 | 2007-08-15
5200 | develop | 9 | 4500 | 2008-01-01
5200 | develop | 7 | 4200 | 2008-01-01
| personnel | 2 | 3900 | 2006-12-23
| personnel | 5 | 3500 | 2007-12-10
| sales | 6 | 5500 | 2007-01-02
| sales | 1 | 5000 | 2006-10-01
5500 | sales | 3 | 4800 | 2007-08-01
5000 | sales | 4 | 4800 | 2007-08-08
(11 rows)

postgres=# select first_value(salary)over(partition by depname order by salary desc),* from empsalary;
first_value | depname | empno | salary | enroll_date
-------------+-----------+-------+--------+-------------
6000 | develop | 8 | 6000 | 2006-10-01
6000 | develop | 10 | 5200 | 2007-08-01
6000 | develop | 11 | 5200 | 2007-08-15
6000 | develop | 9 | 4500 | 2008-01-01
6000 | develop | 7 | 4200 | 2008-01-01
3900 | personnel | 2 | 3900 | 2006-12-23
3900 | personnel | 5 | 3500 | 2007-12-10
5500 | sales | 6 | 5500 | 2007-01-02
5500 | sales | 1 | 5000 | 2006-10-01
5500 | sales | 3 | 4800 | 2007-08-01
5500 | sales | 4 | 4800 | 2007-08-08
(11 rows)

postgres=# select last_value(salary)over(partition by depname order by salary desc),* from empsalary;
last_value | depname | empno | salary | enroll_date
------------+-----------+-------+--------+-------------
6000 | develop | 8 | 6000 | 2006-10-01
5200 | develop | 10 | 5200 | 2007-08-01
5200 | develop | 11 | 5200 | 2007-08-15
4500 | develop | 9 | 4500 | 2008-01-01
4200 | develop | 7 | 4200 | 2008-01-01
3900 | personnel | 2 | 3900 | 2006-12-23
3500 | personnel | 5 | 3500 | 2007-12-10
5500 | sales | 6 | 5500 | 2007-01-02
5000 | sales | 1 | 5000 | 2006-10-01
4800 | sales | 3 | 4800 | 2007-08-01
4800 | sales | 4 | 4800 | 2007-08-08
(11 rows)

postgres=# select nth_value(salary,2)over(partition by depname order by salary desc),* from empsalary;
nth_value | depname | empno | salary | enroll_date
-----------+-----------+-------+--------+-------------
| develop | 8 | 6000 | 2006-10-01
5200 | develop | 10 | 5200 | 2007-08-01
5200 | develop | 11 | 5200 | 2007-08-15
5200 | develop | 9 | 4500 | 2008-01-01
5200 | develop | 7 | 4200 | 2008-01-01
| personnel | 2 | 3900 | 2006-12-23
3500 | personnel | 5 | 3500 | 2007-12-10
| sales | 6 | 5500 | 2007-01-02
5000 | sales | 1 | 5000 | 2006-10-01
5000 | sales | 3 | 4800 | 2007-08-01
5000 | sales | 4 | 4800 | 2007-08-08
(11 rows)当一个查询涉及多个窗口函数的时候,可以用别名的办法来使用,更简单:
postgres=# select sum(salary)over w,avg(salary) over w,* from empsalary window w as (partition by depname order by salary desc);
sum | avg | depname | empno | salary | enroll_date
-------+-----------------------+-----------+-------+--------+-------------
6000 | 6000.0000000000000000 | develop | 8 | 6000 | 2006-10-01
100 | 5466.6666666666666667 | develop | 10 | 5200 | 2007-08-01
100 | 5466.6666666666666667 | develop | 11 | 5200 | 2007-08-15
20900 | 5225.0000000000000000 | develop | 9 | 4500 | 2008-01-01
25100 | 5020.0000000000000000 | develop | 7 | 4200 | 2008-01-01
3900 | 3900.0000000000000000 | personnel | 2 | 3900 | 2006-12-23
7400 | 3700.0000000000000000 | personnel | 5 | 3500 | 2007-12-10
5500 | 5500.0000000000000000 | sales | 6 | 5500 | 2007-01-02
10500 | 5250.0000000000000000 | sales | 1 | 5000 | 2006-10-01
20100 | 5025.0000000000000000 | sales | 3 | 4800 | 2007-08-01
20100 | 5025.0000000000000000 | sales | 4 | 4800 | 2007-08-08
(11 rows)这个写法和下面的是一样的,不过更简单
SELECT sum(salary) OVER (PARTITION BY depname ORDER BY salary DESC), avg(salary) OVER (PARTITION BY depname ORDER BY salary DESC),* FROM empsalary;

下载本文
显示全文
专题