PostgreSQL GROUP BY 语句

PostgreSQL GROUP BY 语句完全指南

GROUP BY 是聚合分析的核心,将行按指定列分组,配合 COUNTSUMAVG 等聚合函数,实现 统计、报表、去重、Top-N 等强大功能。
本文涵盖 语法、聚合函数、HAVING、ROLLUP/CUBE/GROUPING SETS、性能优化、索引、常见陷阱、JSON/数组分组 等全部内容。


一、GROUP BY 基本语法

SELECT 
    grouping_column1,
    grouping_column2,
    AGGREGATE_FUNCTION(column) AS alias,
    ...
FROM table_name
[WHERE ...]
GROUP BY grouping_column1, grouping_column2, ...
[HAVING condition]
[ORDER BY ...];

核心规则

  • SELECT非聚合列必须出现在 GROUP BY
  • 聚合函数(如 COUNT())对每组计算一次

二、核心聚合函数

函数说明示例
COUNT(*)计数(含 NULL)总行数
COUNT(col)计数(跳过 NULL)非空值数
SUM(col)求和总薪资
AVG(col)平均平均薪资
MIN(col) / MAX(col)最小/最大
STRING_AGG(col, ',')字符串连接标签合并
ARRAY_AGG(col)数组聚合
JSONB_AGG(col)JSON 数组

三、准备测试数据

CREATE TABLE sales (
    id BIGSERIAL PRIMARY KEY,
    product TEXT,
    category TEXT,
    region TEXT,
    amount NUMERIC(12,2),
    sale_date DATE,
    customer_type TEXT
);

-- 插入 1000 条测试数据
INSERT INTO sales (product, category, region, amount, sale_date, customer_type)
SELECT 
    'Prod_' || (random() * 50)::INT,
    CASE WHEN random() > 0.5 THEN 'Electronics' ELSE 'Clothing' END,
    CASE 
        WHEN random() < 0.33 THEN 'North'
        WHEN random() < 0.66 THEN 'South'
        ELSE 'West'
    END,
    (random() * 5000 + 100)::NUMERIC(12,2),
    '2025-01-01'::DATE + (random() * 365)::INT,
    CASE WHEN random() > 0.7 THEN 'VIP' ELSE 'Regular' END
FROM generate_series(1, 1000);

四、基础 GROUP BY 用法

1. 单列分组

-- 按地区统计销售额
SELECT region, SUM(amount) AS total_sales
FROM sales
GROUP BY region;

2. 多列分组

-- 按地区 + 品类统计
SELECT region, category, 
       COUNT(*) AS order_count,
       SUM(amount) AS total_sales,
       AVG(amount) AS avg_order_value
FROM sales
GROUP BY region, category;

3. 去重统计

-- 统计每个地区有多少独特产品
SELECT region, COUNT(DISTINCT product) AS unique_products
FROM sales
GROUP BY region;

五、HAVING:分组后过滤(WHERE 的兄弟)

WHERE 过滤原始行
HAVING 过滤分组结果

-- 销售额 > 100000 的地区
SELECT region, SUM(amount) AS total
FROM sales
GROUP BY region
HAVING SUM(amount) > 100000;
-- 订单数 > 50 且平均订单 > 2000 的组合
SELECT region, category, COUNT(*), AVG(amount)
FROM sales
GROUP BY region, category
HAVING COUNT(*) > 50 AND AVG(amount) > 2000;

六、高级分组:ROLLUPCUBEGROUPING SETS

1. ROLLUP:层级汇总(小计 + 总计)

-- 地区 → 品类 → 总计
SELECT region, category, SUM(amount) AS sales
FROM sales
GROUP BY ROLLUP(region, category)
ORDER BY region, category;

输出

 region | category     | sales
--------+--------------+--------
 North  | Electronics  | ...
 North  | Clothing     | ...
 North  | NULL         | ... (North 小计)
 South  | Electronics  | ...
 ...    | ...          | ...
 NULL   | NULL         | ... (总计)

2. CUBE:所有组合汇总

-- 所有维度组合
SELECT region, category, SUM(amount)
FROM sales
GROUP BY CUBE(region, category);

3. GROUPING SETS:自定义组合

-- 分别统计:(region), (category), ()
SELECT region, category, SUM(amount)
FROM sales
GROUP BY GROUPING SETS (region, category, ());

七、GROUPING() 函数:识别汇总行

SELECT 
    region,
    category,
    SUM(amount) AS sales,
    GROUPING(region, category) AS grouping_flag
FROM sales
GROUP BY ROLLUP(region, category);
  • 0 = 正常分组
  • 1 = 该列是汇总行(值为 NULL)

八、字符串与数组聚合

-- 每个地区的产品列表
SELECT region, STRING_AGG(DISTINCT product, ', ') AS products
FROM sales
GROUP BY region;

-- 每个客户的购买记录数组
SELECT customer_type, ARRAY_AGG(amount ORDER BY sale_date) AS amounts
FROM sales
GROUP BY customer_type;

九、JSON 分组聚合

-- 每个地区销售 JSON
SELECT region,
       JSONB_AGG(
           JSONB_BUILD_OBJECT('product', product, 'amount', amount)
       ) AS sales_details
FROM sales
GROUP BY region;

十、性能优化:索引是关键

问题:GROUP BY 列无索引 → 全表扫描 + 排序

解决方案:

-- 单列分组
CREATE INDEX idx_sales_region ON sales(region);

-- 多列分组
CREATE INDEX idx_sales_region_category ON sales(region, category);

-- 包含聚合列(覆盖索引)
CREATE INDEX idx_sales_cover ON sales(region, category, amount);

黄金法则GROUP BY 顺序 = 索引顺序


十一、GROUP BY + 窗口函数:分组内排名

-- 每个地区销售额 Top 3 产品
WITH ranked AS (
    SELECT 
        region, product, amount,
        ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) AS rn
    FROM sales
)
SELECT region, product, amount
FROM ranked
WHERE rn <= 3;

十二、常见陷阱与解决方案

陷阱说明解决方案
SELECT name, COUNT(*)name 没在 GROUP BY加入 GROUP BY name
WHERE SUM(amount) > 1000语法错误改用 HAVING
GROUP BY 顺序影响性能索引不匹配按索引顺序分组
NULL 分组所有 NULL 归为一组正常行为
COUNT(*) vs COUNT(col)含/不含 NULL按需选择

十三、最佳实践 Checklist

项目建议
非聚合列必须 GROUP BY避免错误
HAVING 过滤聚合不要用 WHERE
建分组索引GROUP BY
覆盖索引包含 SELECT 和聚合列
ROLLUP/CUBE 做报表自动小计
字符串用 STRING_AGG合并值
EXPLAIN 验证计划确保走索引

十四、一键销售报表(生产级)

-- 多维度销售分析报表
SELECT 
    COALESCE(region, 'ALL REGIONS') AS region,
    COALESCE(category, 'ALL CATEGORIES') AS category,
    COUNT(*) AS order_count,
    SUM(amount) AS total_sales,
    AVG(amount) AS avg_order,
    MIN(amount) AS min_order,
    MAX(amount) AS max_order,
    GROUPING(region, category) AS is_summary
FROM sales
GROUP BY ROLLUP(region, category)
ORDER BY 
    GROUPING(region, category),
    region NULLS LAST,
    category NULLS LAST;

十五、快速实战:5 分钟掌握 GROUP BY

-- 1. 按地区统计订单数和总金额
SELECT region, COUNT(*), SUM(amount)
FROM sales
GROUP BY region;

-- 2. 筛选销售额 > 50000 的地区
SELECT region, SUM(amount)
FROM sales
GROUP BY region
HAVING SUM(amount) > 50000;

-- 3. 每个品类 VIP 客户平均订单
SELECT category, customer_type, AVG(amount)
FROM sales
WHERE customer_type = 'VIP'
GROUP BY category, customer_type;

-- 4. 建索引优化
CREATE INDEX idx_sales_region_category ON sales(region, category);
EXPLAIN ANALYZE SELECT region, category, SUM(amount)
FROM sales GROUP BY region, category;

现在就动手
sales 表中:

  1. 统计 每个 region + category订单数平均金额
  2. 筛选 订单数 > 100 的组合(用 HAVING
  3. ROLLUP 生成 地区小计 + 总计
  4. 建复合索引,验证 EXPLAIN 是否走 Index Scan

十六、性能对比(EXPLAIN)

-- 慢:无索引
EXPLAIN ANALYZE SELECT region, SUM(amount) FROM sales GROUP BY region;

-- 快:有索引
CREATE INDEX idx_sales_region ON sales(region);
EXPLAIN ANALYZE SELECT region, SUM(amount) FROM sales GROUP BY region;

需要我生成:

  • 10万条测试数据 + 复杂报表?回复 测试数据
  • 动态 GROUP BY(应用层传入列)?回复 动态分组
  • Top-N per Group 完整实现?回复 Top-N
  • EXPLAIN 可视化分析?回复 EXPLAIN 分析

随时告诉我!

类似文章

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注