HiveSQL: SUM() OVER() 累加求和

计算一个周期内的累加销售额,例如2月销售额等于1~2月之和,3月等于1~3月之和, ...

语句结构:聚合函数 OVER (PARTITION BY 分组字段 ORDER BY 排序字段 ROWS/RANGE子句)。

  • 聚合函数:COUNTSUMMINMAXAVG
  • PARTITION BY 分组字段:类似 GROUP BY,比如按销售员分组
  • ORDER BY 排序字段:比如按销售日期排序
  • ROWS/RANGE子句:滑动窗口,比如累加计算最近3个月的销售额
    • ROWS:行数,比如求前/后几行之和
    • RANGE:数值型的范围,直接在排序字段上做加减法

1. 每天累加计算销售额

SELECT sale_date,
       amount,
       sum(amount) over(
                        ORDER BY sale_date) AS total_cnt
FROM sale_amt
1
2
3
4
5

-w1237

2. 按销售员分组每天累加计算销售额

SELECT seller_id,
       sale_date,
       amount,
       SUM(amount) OVER(PARTITION BY seller_id
                        ORDER BY sale_date) AS amount_total
FROM sale_amt_detail
LIMIT 100;
1
2
3
4
5
6
7

-w1232

3. 按销售员分组累加计算最近3次销售额

SELECT seller_id,
       sale_date, amount,
       SUM(amount) OVER(PARTITION BY seller_id
                        ORDER BY sale_date ROWS  BETWEEN 2 PRECEDING AND CURRENT ROW) AS amount_total
FROM sale_amt_detail
LIMIT 100;
1
2
3
4
5
6

当前行 + 前两行

-w1237

如果排序(ORDER BY)的字段是数值型,可以将 ROWS BETWEEN 替换为 RANGE BETWEEN,直接对齐加减,比如月份是1,2,3,4,5,则滑动统计最近3个月的销售额,改成RANGE BETWEEN 2 PRECEDING AND CURRENT ROW

reference