PostgreSQL GROUP BY 语句
PostgreSQL GROUP BY 语句完全指南
GROUP BY是聚合分析的核心,将行按指定列分组,配合COUNT、SUM、AVG等聚合函数,实现 统计、报表、去重、Top-N 等强大功能。
本文涵盖 语法、聚合函数、HAVING、ROLLUP/CUBE/GROUPING SETS、性能优化、索引、常见陷阱、JSON/数组分组 等全部内容。
一、GROUP BY 基本语法
SELECT
grouping_column1,
grouping_column2,
AGGREGATE_FUNCTION(column) AS alias,
...
FROM table_name
[WHERE ...]
GROUP BY grouping_column1, grouping_column2, ...
[HAVING condition]
[ORDER BY ...];
核心规则:
SELECT中非聚合列必须出现在GROUP BY中- 聚合函数(如
COUNT())对每组计算一次
二、核心聚合函数
| 函数 | 说明 | 示例 |
|---|---|---|
COUNT(*) | 计数(含 NULL) | 总行数 |
COUNT(col) | 计数(跳过 NULL) | 非空值数 |
SUM(col) | 求和 | 总薪资 |
AVG(col) | 平均 | 平均薪资 |
MIN(col) / MAX(col) | 最小/最大 | |
STRING_AGG(col, ',') | 字符串连接 | 标签合并 |
ARRAY_AGG(col) | 数组聚合 | |
JSONB_AGG(col) | JSON 数组 |
三、准备测试数据
CREATE TABLE sales (
id BIGSERIAL PRIMARY KEY,
product TEXT,
category TEXT,
region TEXT,
amount NUMERIC(12,2),
sale_date DATE,
customer_type TEXT
);
-- 插入 1000 条测试数据
INSERT INTO sales (product, category, region, amount, sale_date, customer_type)
SELECT
'Prod_' || (random() * 50)::INT,
CASE WHEN random() > 0.5 THEN 'Electronics' ELSE 'Clothing' END,
CASE
WHEN random() < 0.33 THEN 'North'
WHEN random() < 0.66 THEN 'South'
ELSE 'West'
END,
(random() * 5000 + 100)::NUMERIC(12,2),
'2025-01-01'::DATE + (random() * 365)::INT,
CASE WHEN random() > 0.7 THEN 'VIP' ELSE 'Regular' END
FROM generate_series(1, 1000);
四、基础 GROUP BY 用法
1. 单列分组
-- 按地区统计销售额
SELECT region, SUM(amount) AS total_sales
FROM sales
GROUP BY region;
2. 多列分组
-- 按地区 + 品类统计
SELECT region, category,
COUNT(*) AS order_count,
SUM(amount) AS total_sales,
AVG(amount) AS avg_order_value
FROM sales
GROUP BY region, category;
3. 去重统计
-- 统计每个地区有多少独特产品
SELECT region, COUNT(DISTINCT product) AS unique_products
FROM sales
GROUP BY region;
五、HAVING:分组后过滤(WHERE 的兄弟)
WHERE过滤原始行HAVING过滤分组结果
-- 销售额 > 100000 的地区
SELECT region, SUM(amount) AS total
FROM sales
GROUP BY region
HAVING SUM(amount) > 100000;
-- 订单数 > 50 且平均订单 > 2000 的组合
SELECT region, category, COUNT(*), AVG(amount)
FROM sales
GROUP BY region, category
HAVING COUNT(*) > 50 AND AVG(amount) > 2000;
六、高级分组:ROLLUP、CUBE、GROUPING SETS
1. ROLLUP:层级汇总(小计 + 总计)
-- 地区 → 品类 → 总计
SELECT region, category, SUM(amount) AS sales
FROM sales
GROUP BY ROLLUP(region, category)
ORDER BY region, category;
输出:
region | category | sales
--------+--------------+--------
North | Electronics | ...
North | Clothing | ...
North | NULL | ... (North 小计)
South | Electronics | ...
... | ... | ...
NULL | NULL | ... (总计)
2. CUBE:所有组合汇总
-- 所有维度组合
SELECT region, category, SUM(amount)
FROM sales
GROUP BY CUBE(region, category);
3. GROUPING SETS:自定义组合
-- 分别统计:(region), (category), ()
SELECT region, category, SUM(amount)
FROM sales
GROUP BY GROUPING SETS (region, category, ());
七、GROUPING() 函数:识别汇总行
SELECT
region,
category,
SUM(amount) AS sales,
GROUPING(region, category) AS grouping_flag
FROM sales
GROUP BY ROLLUP(region, category);
0= 正常分组1= 该列是汇总行(值为 NULL)
八、字符串与数组聚合
-- 每个地区的产品列表
SELECT region, STRING_AGG(DISTINCT product, ', ') AS products
FROM sales
GROUP BY region;
-- 每个客户的购买记录数组
SELECT customer_type, ARRAY_AGG(amount ORDER BY sale_date) AS amounts
FROM sales
GROUP BY customer_type;
九、JSON 分组聚合
-- 每个地区销售 JSON
SELECT region,
JSONB_AGG(
JSONB_BUILD_OBJECT('product', product, 'amount', amount)
) AS sales_details
FROM sales
GROUP BY region;
十、性能优化:索引是关键
问题:GROUP BY 列无索引 → 全表扫描 + 排序
解决方案:
-- 单列分组
CREATE INDEX idx_sales_region ON sales(region);
-- 多列分组
CREATE INDEX idx_sales_region_category ON sales(region, category);
-- 包含聚合列(覆盖索引)
CREATE INDEX idx_sales_cover ON sales(region, category, amount);
黄金法则:
GROUP BY顺序 = 索引顺序
十一、GROUP BY + 窗口函数:分组内排名
-- 每个地区销售额 Top 3 产品
WITH ranked AS (
SELECT
region, product, amount,
ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) AS rn
FROM sales
)
SELECT region, product, amount
FROM ranked
WHERE rn <= 3;
十二、常见陷阱与解决方案
| 陷阱 | 说明 | 解决方案 |
|---|---|---|
SELECT name, COUNT(*) | name 没在 GROUP BY | 加入 GROUP BY name |
WHERE SUM(amount) > 1000 | 语法错误 | 改用 HAVING |
GROUP BY 顺序影响性能 | 索引不匹配 | 按索引顺序分组 |
NULL 分组 | 所有 NULL 归为一组 | 正常行为 |
COUNT(*) vs COUNT(col) | 含/不含 NULL | 按需选择 |
十三、最佳实践 Checklist
| 项目 | 建议 |
|---|---|
非聚合列必须 GROUP BY | 避免错误 |
HAVING 过滤聚合 | 不要用 WHERE |
| 建分组索引 | GROUP BY 列 |
| 覆盖索引 | 包含 SELECT 和聚合列 |
用 ROLLUP/CUBE 做报表 | 自动小计 |
字符串用 STRING_AGG | 合并值 |
EXPLAIN 验证计划 | 确保走索引 |
十四、一键销售报表(生产级)
-- 多维度销售分析报表
SELECT
COALESCE(region, 'ALL REGIONS') AS region,
COALESCE(category, 'ALL CATEGORIES') AS category,
COUNT(*) AS order_count,
SUM(amount) AS total_sales,
AVG(amount) AS avg_order,
MIN(amount) AS min_order,
MAX(amount) AS max_order,
GROUPING(region, category) AS is_summary
FROM sales
GROUP BY ROLLUP(region, category)
ORDER BY
GROUPING(region, category),
region NULLS LAST,
category NULLS LAST;
十五、快速实战:5 分钟掌握 GROUP BY
-- 1. 按地区统计订单数和总金额
SELECT region, COUNT(*), SUM(amount)
FROM sales
GROUP BY region;
-- 2. 筛选销售额 > 50000 的地区
SELECT region, SUM(amount)
FROM sales
GROUP BY region
HAVING SUM(amount) > 50000;
-- 3. 每个品类 VIP 客户平均订单
SELECT category, customer_type, AVG(amount)
FROM sales
WHERE customer_type = 'VIP'
GROUP BY category, customer_type;
-- 4. 建索引优化
CREATE INDEX idx_sales_region_category ON sales(region, category);
EXPLAIN ANALYZE SELECT region, category, SUM(amount)
FROM sales GROUP BY region, category;
现在就动手:
在 sales 表中:
- 统计 每个 region + category 的 订单数 和 平均金额
- 筛选 订单数 > 100 的组合(用
HAVING) - 用
ROLLUP生成 地区小计 + 总计 - 建复合索引,验证
EXPLAIN是否走 Index Scan
十六、性能对比(EXPLAIN)
-- 慢:无索引
EXPLAIN ANALYZE SELECT region, SUM(amount) FROM sales GROUP BY region;
-- 快:有索引
CREATE INDEX idx_sales_region ON sales(region);
EXPLAIN ANALYZE SELECT region, SUM(amount) FROM sales GROUP BY region;
需要我生成:
- 10万条测试数据 + 复杂报表?回复
测试数据 - 动态 GROUP BY(应用层传入列)?回复
动态分组 - Top-N per Group 完整实现?回复
Top-N - EXPLAIN 可视化分析?回复
EXPLAIN 分析
随时告诉我!