PostgreSQL GROUP BY 语句
PostgreSQL GROUP BY 语句完全指南
GROUP BY是聚合分析的核心,将行按指定列分组,配合COUNT、SUM、AVG等聚合函数,实现 统计、报表、去重、Top-N 等强大功能。
本文涵盖 语法、聚合函数、HAVING、ROLLUP/CUBE/GROUPING SETS、性能优化、索引、常见陷阱、JSON/数组分组 等全部内容。
一、GROUP BY 基本语法
SELECT 
    grouping_column1,
    grouping_column2,
    AGGREGATE_FUNCTION(column) AS alias,
    ...
FROM table_name
[WHERE ...]
GROUP BY grouping_column1, grouping_column2, ...
[HAVING condition]
[ORDER BY ...];
核心规则:
SELECT中非聚合列必须出现在GROUP BY中- 聚合函数(如
 COUNT())对每组计算一次
二、核心聚合函数
| 函数 | 说明 | 示例 | 
|---|---|---|
COUNT(*) | 计数(含 NULL) | 总行数 | 
COUNT(col) | 计数(跳过 NULL) | 非空值数 | 
SUM(col) | 求和 | 总薪资 | 
AVG(col) | 平均 | 平均薪资 | 
MIN(col) / MAX(col) | 最小/最大 | |
STRING_AGG(col, ',') | 字符串连接 | 标签合并 | 
ARRAY_AGG(col) | 数组聚合 | |
JSONB_AGG(col) | JSON 数组 | 
三、准备测试数据
CREATE TABLE sales (
    id BIGSERIAL PRIMARY KEY,
    product TEXT,
    category TEXT,
    region TEXT,
    amount NUMERIC(12,2),
    sale_date DATE,
    customer_type TEXT
);
-- 插入 1000 条测试数据
INSERT INTO sales (product, category, region, amount, sale_date, customer_type)
SELECT 
    'Prod_' || (random() * 50)::INT,
    CASE WHEN random() > 0.5 THEN 'Electronics' ELSE 'Clothing' END,
    CASE 
        WHEN random() < 0.33 THEN 'North'
        WHEN random() < 0.66 THEN 'South'
        ELSE 'West'
    END,
    (random() * 5000 + 100)::NUMERIC(12,2),
    '2025-01-01'::DATE + (random() * 365)::INT,
    CASE WHEN random() > 0.7 THEN 'VIP' ELSE 'Regular' END
FROM generate_series(1, 1000);
四、基础 GROUP BY 用法
1. 单列分组
-- 按地区统计销售额
SELECT region, SUM(amount) AS total_sales
FROM sales
GROUP BY region;
2. 多列分组
-- 按地区 + 品类统计
SELECT region, category, 
       COUNT(*) AS order_count,
       SUM(amount) AS total_sales,
       AVG(amount) AS avg_order_value
FROM sales
GROUP BY region, category;
3. 去重统计
-- 统计每个地区有多少独特产品
SELECT region, COUNT(DISTINCT product) AS unique_products
FROM sales
GROUP BY region;
五、HAVING:分组后过滤(WHERE 的兄弟)
WHERE过滤原始行HAVING过滤分组结果
-- 销售额 > 100000 的地区
SELECT region, SUM(amount) AS total
FROM sales
GROUP BY region
HAVING SUM(amount) > 100000;
-- 订单数 > 50 且平均订单 > 2000 的组合
SELECT region, category, COUNT(*), AVG(amount)
FROM sales
GROUP BY region, category
HAVING COUNT(*) > 50 AND AVG(amount) > 2000;
六、高级分组:ROLLUP、CUBE、GROUPING SETS
1. ROLLUP:层级汇总(小计 + 总计)
-- 地区 → 品类 → 总计
SELECT region, category, SUM(amount) AS sales
FROM sales
GROUP BY ROLLUP(region, category)
ORDER BY region, category;
输出:
 region | category     | sales
--------+--------------+--------
 North  | Electronics  | ...
 North  | Clothing     | ...
 North  | NULL         | ... (North 小计)
 South  | Electronics  | ...
 ...    | ...          | ...
 NULL   | NULL         | ... (总计)
2. CUBE:所有组合汇总
-- 所有维度组合
SELECT region, category, SUM(amount)
FROM sales
GROUP BY CUBE(region, category);
3. GROUPING SETS:自定义组合
-- 分别统计:(region), (category), ()
SELECT region, category, SUM(amount)
FROM sales
GROUP BY GROUPING SETS (region, category, ());
七、GROUPING() 函数:识别汇总行
SELECT 
    region,
    category,
    SUM(amount) AS sales,
    GROUPING(region, category) AS grouping_flag
FROM sales
GROUP BY ROLLUP(region, category);
0= 正常分组1= 该列是汇总行(值为 NULL)
八、字符串与数组聚合
-- 每个地区的产品列表
SELECT region, STRING_AGG(DISTINCT product, ', ') AS products
FROM sales
GROUP BY region;
-- 每个客户的购买记录数组
SELECT customer_type, ARRAY_AGG(amount ORDER BY sale_date) AS amounts
FROM sales
GROUP BY customer_type;
九、JSON 分组聚合
-- 每个地区销售 JSON
SELECT region,
       JSONB_AGG(
           JSONB_BUILD_OBJECT('product', product, 'amount', amount)
       ) AS sales_details
FROM sales
GROUP BY region;
十、性能优化:索引是关键
问题:GROUP BY 列无索引 → 全表扫描 + 排序
解决方案:
-- 单列分组
CREATE INDEX idx_sales_region ON sales(region);
-- 多列分组
CREATE INDEX idx_sales_region_category ON sales(region, category);
-- 包含聚合列(覆盖索引)
CREATE INDEX idx_sales_cover ON sales(region, category, amount);
黄金法则:
GROUP BY顺序 = 索引顺序
十一、GROUP BY + 窗口函数:分组内排名
-- 每个地区销售额 Top 3 产品
WITH ranked AS (
    SELECT 
        region, product, amount,
        ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) AS rn
    FROM sales
)
SELECT region, product, amount
FROM ranked
WHERE rn <= 3;
十二、常见陷阱与解决方案
| 陷阱 | 说明 | 解决方案 | 
|---|---|---|
SELECT name, COUNT(*) | name 没在 GROUP BY | 加入 GROUP BY name | 
WHERE SUM(amount) > 1000 | 语法错误 | 改用 HAVING | 
GROUP BY 顺序影响性能 | 索引不匹配 | 按索引顺序分组 | 
NULL 分组 | 所有 NULL 归为一组 | 正常行为 | 
COUNT(*) vs COUNT(col) | 含/不含 NULL | 按需选择 | 
十三、最佳实践 Checklist
| 项目 | 建议 | 
|---|---|
非聚合列必须 GROUP BY | 避免错误 | 
HAVING 过滤聚合 | 不要用 WHERE | 
| 建分组索引 | GROUP BY 列 | 
| 覆盖索引 | 包含 SELECT 和聚合列 | 
用 ROLLUP/CUBE 做报表 | 自动小计 | 
字符串用 STRING_AGG | 合并值 | 
EXPLAIN 验证计划 | 确保走索引 | 
十四、一键销售报表(生产级)
-- 多维度销售分析报表
SELECT 
    COALESCE(region, 'ALL REGIONS') AS region,
    COALESCE(category, 'ALL CATEGORIES') AS category,
    COUNT(*) AS order_count,
    SUM(amount) AS total_sales,
    AVG(amount) AS avg_order,
    MIN(amount) AS min_order,
    MAX(amount) AS max_order,
    GROUPING(region, category) AS is_summary
FROM sales
GROUP BY ROLLUP(region, category)
ORDER BY 
    GROUPING(region, category),
    region NULLS LAST,
    category NULLS LAST;
十五、快速实战:5 分钟掌握 GROUP BY
-- 1. 按地区统计订单数和总金额
SELECT region, COUNT(*), SUM(amount)
FROM sales
GROUP BY region;
-- 2. 筛选销售额 > 50000 的地区
SELECT region, SUM(amount)
FROM sales
GROUP BY region
HAVING SUM(amount) > 50000;
-- 3. 每个品类 VIP 客户平均订单
SELECT category, customer_type, AVG(amount)
FROM sales
WHERE customer_type = 'VIP'
GROUP BY category, customer_type;
-- 4. 建索引优化
CREATE INDEX idx_sales_region_category ON sales(region, category);
EXPLAIN ANALYZE SELECT region, category, SUM(amount)
FROM sales GROUP BY region, category;
现在就动手:
在 sales 表中:
- 统计 每个 region + category 的 订单数 和 平均金额
 - 筛选 订单数 > 100 的组合(用 
HAVING) - 用 
ROLLUP生成 地区小计 + 总计 - 建复合索引,验证 
EXPLAIN是否走 Index Scan 
十六、性能对比(EXPLAIN)
-- 慢:无索引
EXPLAIN ANALYZE SELECT region, SUM(amount) FROM sales GROUP BY region;
-- 快:有索引
CREATE INDEX idx_sales_region ON sales(region);
EXPLAIN ANALYZE SELECT region, SUM(amount) FROM sales GROUP BY region;
需要我生成:
- 10万条测试数据 + 复杂报表?回复 
测试数据 - 动态 GROUP BY(应用层传入列)?回复 
动态分组 - Top-N per Group 完整实现?回复 
Top-N - EXPLAIN 可视化分析?回复 
EXPLAIN 分析 
随时告诉我!