【MySQL】数据分析双剑客:聚合函数 与 GROUP BY 子句的完美搭配
在 MySQL 的数据分析场景中,聚合函数 + GROUP BY 是最常用、最强大的组合,几乎所有报表、统计、看板类需求都离不开它们。
这篇文章会带你从基础用法 → 进阶玩法 → 常见陷阱 → 高阶技巧,一步步把这两个“双剑客”用透。
一、核心概念先搞清楚
1. 聚合函数(Aggregate Functions)
对一组数据进行汇总计算,返回单个结果。
| 函数 | 作用 | 是否忽略 NULL | 常见用法场景 |
|---|---|---|---|
COUNT() | 统计行数 | 是 | 计数、去重计数 |
SUM() | 求和 | 是 | 销售额、总金额 |
AVG() | 平均值 | 是 | 平均评分、平均年龄 |
MAX() | 最大值 | 是 | 最高分、最新时间 |
MIN() | 最小值 | 是 | 最低价、最早时间 |
GROUP_CONCAT() | 字符串拼接(分组内) | 是 | 把分组内的值拼成一行 |
STDDEV() / VARIANCE() | 标准差 / 方差 | 是 | 数据离散程度分析 |
JSON_ARRAYAGG() / JSON_OBJECTAGG() | JSON 聚合 | 是 | 现代业务中常用来返回嵌套结构 |
2. GROUP BY 子句
把数据按照某个字段(或多个字段)分组,聚合函数只在每个组内计算。
一句话理解:
没有 GROUP BY → 聚合函数对整个表计算一次
有了 GROUP BY → 聚合函数对每一组分别计算一次
二、基础用法(必须熟练)
示例表结构(假设 orders 表)
id order_date user_id amount product_category status
1 2025-01-01 101 299 phone paid
2 2025-01-02 102 1299 laptop paid
3 2025-01-02 101 199 accessory paid
4 2025-01-03 103 499 tablet paid
5 2025-01-04 101 0 phone refunded
1. 最基础:按单个字段分组 + 聚合
-- 按用户统计订单金额和订单数
SELECT
user_id,
COUNT(*) AS order_count,
SUM(amount) AS total_amount,
AVG(amount) AS avg_amount,
MAX(amount) AS max_amount
FROM orders
GROUP BY user_id;
结果:
user_id | order_count | total_amount | avg_amount | max_amount
--------|-------------|--------------|------------|-----------
101 | 3 | 498 | 166.00 | 299
102 | 1 | 1299 | 1299.00 | 1299
103 | 1 | 499 | 499.00 | 499
2. 多个字段分组
-- 按用户 + 分类统计
SELECT
user_id,
product_category,
COUNT(*) AS cnt,
SUM(amount) AS total
FROM orders
GROUP BY user_id, product_category;
3. 带 WHERE + HAVING
-- 先筛选已支付订单,再按用户分组,最后筛选总金额 > 1000 的用户
SELECT
user_id,
SUM(amount) AS total_amount
FROM orders
WHERE status = 'paid'
GROUP BY user_id
HAVING total_amount > 1000;
关键记忆:
- WHERE:先过滤原始行,再分组
- HAVING:分组后过滤聚合结果
三、进阶用法(面试 & 真实场景高频)
1. GROUP_CONCAT(分组内拼接)
-- 每个用户购买过的所有品类(去重)
SELECT
user_id,
GROUP_CONCAT(DISTINCT product_category ORDER BY product_category) AS categories
FROM orders
GROUP BY user_id;
结果示例:
user_id | categories
--------|---------------------
101 | accessory,phone
102 | laptop
进阶:控制分隔符、长度
GROUP_CONCAT(DISTINCT product_category SEPARATOR ' | ' LIMIT 3)
2. WITH ROLLUP(小计 & 总计)
SELECT
IFNULL(user_id, '总计') AS user_id,
COUNT(*) AS order_count,
SUM(amount) AS total_amount
FROM orders
GROUP BY user_id WITH ROLLUP;
结果会多一行总计:
user_id | order_count | total_amount
--------|-------------|-------------
101 | 3 | 498
102 | 1 | 1299
103 | 1 | 499
总计 | 5 | 2296
3. 结合窗口函数(MySQL 8.0+)
-- 每个用户订单金额排名
SELECT
user_id,
order_date,
amount,
RANK() OVER (PARTITION BY user_id ORDER BY amount DESC) AS rank_in_user
FROM orders;
4. 按时间分组(常见报表)
-- 按月统计销售额
SELECT
DATE_FORMAT(order_date, '%Y-%m') AS month,
COUNT(*) AS order_count,
SUM(amount) AS total_amount
FROM orders
GROUP BY month
ORDER BY month;
四、最容易踩的坑(面试常考)
- SELECT 里非聚合字段必须出现在 GROUP BY 中 错误:
SELECT user_id, order_date, SUM(amount)
FROM orders
GROUP BY user_id; -- 报错或随机值(MySQL 5.7 宽松模式)
正确:要么加到 GROUP BY,要么用聚合函数
- HAVING 与 WHERE 顺序不能混 HAVING 只能用在 GROUP BY 之后
- COUNT(*)、COUNT(1)、COUNT(字段) 的区别
COUNT(*):统计所有行(最常用)COUNT(1):等价于 COUNT(*)COUNT(字段):只统计字段非 NULL 的行
- GROUP BY 后排序 聚合后的排序要写在最后:
GROUP BY user_id
ORDER BY total_amount DESC;
- NULL 的处理 聚合函数会自动忽略 NULL,GROUP BY 时 NULL 是一类
五、实战模板(直接抄走用)
- 按某字段统计基本指标
SELECT
xxx AS group_key,
COUNT(*) AS cnt,
COUNT(DISTINCT user_id) AS uv,
SUM(amount) AS total,
AVG(amount) AS avg,
MAX(amount) AS max_value,
MIN(amount) AS min_value
FROM table
WHERE ...
GROUP BY xxx
HAVING cnt > 10
ORDER BY total DESC
LIMIT 100;
- 带时间分组(日报/月报)
SELECT
DATE_FORMAT(created_at, '%Y-%m-%d') AS date,
COUNT(*) AS pv,
COUNT(DISTINCT user_id) AS uv,
SUM(pay_amount) AS gmv
FROM orders
WHERE status = 'paid'
GROUP BY date
ORDER BY date DESC;
- Top N 统计(每个分类 Top 3 商品)
SELECT *
FROM (
SELECT
category,
product_name,
SUM(sales) AS total_sales,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY SUM(sales) DESC) AS rn
FROM sales
GROUP BY category, product_name
) t
WHERE rn <= 3;
总结:一句话口诀
聚合函数负责“算”,GROUP BY 负责“分”,HAVING 负责“筛”,ORDER BY 负责“排”
WHERE 先过滤原始数据,GROUP BY 再分组算,HAVING 筛分组结果,最后排序输出。
你现在是想练习某个具体场景的 SQL(比如销售额月环比、用户留存、商品 Top N、漏斗分析等),还是想看某个复杂 GROUP BY 的完整案例?
直接告诉我,我可以继续给你写更贴近你需求的 SQL。