HiveSQL: SUM() OVER() 累加求和
计算一个周期内的累加销售额,例如2月销售额等于1~2月之和,3月等于1~3月之和, ...
语句结构:聚合函数 OVER (PARTITION BY 分组字段 ORDER BY 排序字段 ROWS/RANGE子句)。
- 聚合函数:
COUNT
、SUM
、MIN
、MAX
、AVG
- PARTITION BY 分组字段:类似 GROUP BY,比如按销售员分组
- ORDER BY 排序字段:比如按销售日期排序
- ROWS/RANGE子句:滑动窗口,比如累加计算最近3个月的销售额
- ROWS:行数,比如求前/后几行之和
- RANGE:数值型的范围,直接在排序字段上做加减法
1. 每天累加计算销售额
SELECT sale_date,
amount,
sum(amount) over(
ORDER BY sale_date) AS total_cnt
FROM sale_amt
1
2
3
4
5
2
3
4
5
2. 按销售员分组每天累加计算销售额
SELECT seller_id,
sale_date,
amount,
SUM(amount) OVER(PARTITION BY seller_id
ORDER BY sale_date) AS amount_total
FROM sale_amt_detail
LIMIT 100;
1
2
3
4
5
6
7
2
3
4
5
6
7
3. 按销售员分组累加计算最近3次销售额
SELECT seller_id,
sale_date, amount,
SUM(amount) OVER(PARTITION BY seller_id
ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS amount_total
FROM sale_amt_detail
LIMIT 100;
1
2
3
4
5
6
2
3
4
5
6
当前行 + 前两行
如果排序(ORDER BY)的字段是数值型,可以将 ROWS BETWEEN
替换为 RANGE BETWEEN
,直接对齐加减,比如月份是1,2,3,4,5,则滑动统计最近3个月的销售额,改成RANGE BETWEEN 2 PRECEDING AND CURRENT ROW
reference
- [1] Apache. LanguageManual WindowingAndAnalyticsopen in new window
- [2] freesion. HIVE中使用OVER()实现累积求和和滑动求和open in new window
- [3] pwz1688. 分析函数用法及窗口子句 range/rows差别open in new window