PostgreSQL HAVING 子句
PostgreSQL HAVING 子句完全指南
HAVING是聚合后的“守门员”,专门过滤 分组结果(GROUP BY后的每组)。
与WHERE不同:
WHERE→ 过滤原始行HAVING→ 过滤聚合结果
一、HAVING 基本语法
SELECT
grouping_column,
AGGREGATE_FUNCTION(...)
FROM table_name
[WHERE ...] -- 过滤原始行
GROUP BY grouping_column
HAVING condition; -- 过滤分组结果
核心规则:
HAVING必须与GROUP BY一起使用- 只能在
HAVING中使用 聚合函数 或 分组列
二、准备测试数据
CREATE TABLE sales (
id BIGSERIAL PRIMARY KEY,
product TEXT,
category TEXT,
region TEXT,
amount NUMERIC(12,2),
sale_date DATE,
customer_id BIGINT
);
-- 插入多样化数据
INSERT INTO sales (product, category, region, amount, sale_date, customer_id) VALUES
('iPhone 15', 'Electronics', 'North', 1200, '2025-01-15', 101),
('T-Shirt', 'Clothing', 'North', 25, '2025-01-16', 102),
('MacBook', 'Electronics', 'South', 2500, '2025-01-17', 103),
('Jeans', 'Clothing', 'South', 80, '2025-01-18', 104),
('AirPods', 'Electronics', 'North', 250, '2025-01-19', 101),
('Jacket', 'Clothing', 'West', 150, '2025-01-20', 105),
('iPad', 'Electronics', 'North', 800, '2025-01-21', 106),
('Sneakers', 'Clothing', 'South', 120, '2025-01-22', 102);
三、WHERE vs HAVING:核心区别
| 过滤对象 | 子句 | 示例 |
|---|---|---|
| 原始行 | WHERE | WHERE amount > 1000 |
| 聚合结果 | HAVING | HAVING COUNT(*) > 5 |
-- 错误!WHERE 不能用聚合函数
SELECT region, SUM(amount)
FROM sales
WHERE COUNT(*) > 2 -- 语法错误!
GROUP BY region;
-- 正确写法
SELECT region, SUM(amount) AS total
FROM sales
GROUP BY region
HAVING COUNT(*) > 2;
四、基础 HAVING 用法
| 需求 | SQL |
|---|---|
| 订单数 > 3 的地区 | HAVING COUNT(*) > 3 |
| 总销售额 > 2000 的品类 | HAVING SUM(amount) > 2000 |
| 平均订单 > 500 的组合 | HAVING AVG(amount) > 500 |
-- 订单数 > 2 且总销售额 > 1500 的地区
SELECT region,
COUNT(*) AS order_count,
SUM(amount) AS total_sales
FROM sales
GROUP BY region
HAVING COUNT(*) > 2
AND SUM(amount) > 1500;
输出:
region | order_count | total_sales
--------+-------------+-------------
North | 4 | 2275
五、复杂 HAVING 条件
-- 订单数 > 3,且最大单笔 > 1000,且平均订单 ≠ 0
SELECT region,
COUNT(*) AS cnt,
MAX(amount) AS max_amt,
AVG(amount) AS avg_amt
FROM sales
GROUP BY region
HAVING COUNT(*) > 3
AND MAX(amount) > 1000
AND AVG(amount) IS NOT NULL;
六、HAVING + 表达式
-- 销售额标准差 > 500 的地区(需计算方差)
SELECT region,
STDDEV(amount) AS sales_std
FROM sales
GROUP BY region
HAVING STDDEV(amount) > 500;
-- 销售额占总比 > 40% 的品类
WITH total AS (SELECT SUM(amount) AS grand_total FROM sales)
SELECT category, SUM(amount) AS cat_total
FROM sales, total
GROUP BY category
HAVING SUM(amount) > 0.4 * total.grand_total;
七、HAVING + GROUPING SETS / ROLLUP / CUBE
-- 包含小计,但只显示订单数 > 1 的组
SELECT region, category, COUNT(*)
FROM sales
GROUP BY ROLLUP(region, category)
HAVING COUNT(*) > 1;
八、HAVING + 窗口函数?不行!
-- 错误!HAVING 不能直接用窗口函数
SELECT region, SUM(amount),
RANK() OVER (PARTITION BY region ORDER BY SUM(amount) DESC)
FROM sales
GROUP BY region
HAVING RANK() = 1; -- 语法错误
正确做法:用 CTE 或子查询
WITH ranked AS (
SELECT region, SUM(amount) AS total,
RANK() OVER (ORDER BY SUM(amount) DESC) AS rnk
FROM sales
GROUP BY region
)
SELECT region, total
FROM ranked
WHERE rnk = 1;
九、性能优化:HAVING 不影响索引
| 好消息 | HAVING 是在分组后过滤,不影响 WHERE 索引 |
|---|---|
| 建议 | 优先用 WHERE 过滤原始行,减少分组开销 |
-- 推荐:先 WHERE 过滤,再 GROUP + HAVING
SELECT region, SUM(amount)
FROM sales
WHERE sale_date >= '2025-01-01'
GROUP BY region
HAVING SUM(amount) > 1000;
-- 建索引加速 WHERE
CREATE INDEX idx_sales_date ON sales(sale_date);
十、常见陷阱与解决方案
| 陷阱 | 说明 | 解决方案 |
|---|---|---|
WHERE COUNT(*) > 5 | 语法错误 | 改用 HAVING |
HAVING region = 'North' | 没 GROUP BY | 必须加 GROUP BY region |
HAVING amount > 1000 | amount 不是聚合列 | 用 MAX(amount) 或移到 WHERE |
HAVING 过滤太晚 | 全部分组后才过滤 | 提前用 WHERE 减少行数 |
-- 错误
SELECT region, amount
FROM sales
GROUP BY region
HAVING amount > 1000;
-- 正确
SELECT region, MAX(amount)
FROM sales
WHERE amount > 1000
GROUP BY region;
十一、最佳实践 Checklist
| 项目 | 建议 |
|---|---|
HAVING 必须配 GROUP BY | 避免语法错误 |
优先用 WHERE 过滤原始行 | 提升性能 |
HAVING 只用聚合函数或分组列 | 符合规则 |
| 复杂过滤用 CTE | 清晰 + 可复用 |
建 WHERE 索引 | 加速过滤 |
EXPLAIN 验证计划 | 确保无全表扫描 |
十二、一键销售分析(生产级)
-- 筛选:订单数 > 2,平均订单 > 500,最大单笔 > 1000 的地区
WITH summary AS (
SELECT
region,
COUNT(*) AS order_count,
SUM(amount) AS total_sales,
AVG(amount) AS avg_order,
MAX(amount) AS max_order
FROM sales
WHERE sale_date >= '2025-01-01'
GROUP BY region
)
SELECT *
FROM summary
WHERE order_count > 2
AND avg_order > 500
AND max_order > 1000;
十三、快速实战:5 分钟掌握 HAVING
-- 1. 订单数 > 3 的地区
SELECT region, COUNT(*)
FROM sales
GROUP BY region
HAVING COUNT(*) > 3;
-- 2. 总销售额 > 2000 的品类
SELECT category, SUM(amount)
FROM sales
GROUP BY category
HAVING SUM(amount) > 2000;
-- 3. 平均订单 > 100 且有 Electronics 的地区
SELECT region, AVG(amount), STRING_AGG(category, ',')
FROM sales
GROUP BY region
HAVING AVG(amount) > 100
AND MAX(category) = 'Electronics';
-- 4. 优化:先 WHERE 再 HAVING
SELECT region, SUM(amount)
FROM sales
WHERE amount > 50
GROUP BY region
HAVING COUNT(*) > 1;
现在就动手:
在 sales 表中:
- 统计 每个 region 的 订单数 和 总金额,筛选 订单数 > 2 的地区
- 筛选 平均订单 > 1000 的 category
- 用 CTE 实现:订单数 > 1 且最大单笔 > 1000 的 region + category 组合
- 建
WHERE索引,验证EXPLAIN性能提升
十四、性能对比(EXPLAIN)
-- 慢:全表分组
EXPLAIN ANALYZE
SELECT region, COUNT(*) FROM sales GROUP BY region HAVING COUNT(*) > 1;
-- 快:先过滤
EXPLAIN ANALYZE
SELECT region, COUNT(*) FROM sales
WHERE amount > 100
GROUP BY region
HAVING COUNT(*) > 1;
需要我生成:
- 10万条销售数据 + 复杂 HAVING 分析?回复
测试数据 - 动态 HAVING 条件(Python)?回复
Python HAVING - Top-N per Group + HAVING?回复
Top-N - EXPLAIN 可视化分析?回复
EXPLAIN 分析
随时告诉我!