【MySQL】数据分析双剑客:聚合函数 与 group by子句的完美搭配

【MySQL】数据分析双剑客:聚合函数 与 GROUP BY 子句的完美搭配

在 MySQL 的数据分析场景中,聚合函数 + GROUP BY 是最常用、最强大的组合,几乎所有报表、统计、看板类需求都离不开它们。

这篇文章会带你从基础用法 → 进阶玩法 → 常见陷阱 → 高阶技巧,一步步把这两个“双剑客”用透。

一、核心概念先搞清楚

1. 聚合函数(Aggregate Functions)

对一组数据进行汇总计算,返回单个结果。

函数作用是否忽略 NULL常见用法场景
COUNT()统计行数计数、去重计数
SUM()求和销售额、总金额
AVG()平均值平均评分、平均年龄
MAX()最大值最高分、最新时间
MIN()最小值最低价、最早时间
GROUP_CONCAT()字符串拼接(分组内)把分组内的值拼成一行
STDDEV() / VARIANCE()标准差 / 方差数据离散程度分析
JSON_ARRAYAGG() / JSON_OBJECTAGG()JSON 聚合现代业务中常用来返回嵌套结构

2. GROUP BY 子句

把数据按照某个字段(或多个字段)分组,聚合函数只在每个组内计算。

一句话理解
没有 GROUP BY → 聚合函数对整个表计算一次
有了 GROUP BY → 聚合函数对每一组分别计算一次

二、基础用法(必须熟练)

示例表结构(假设 orders 表)

id   order_date    user_id   amount   product_category   status
1    2025-01-01    101       299      phone             paid
2    2025-01-02    102       1299     laptop            paid
3    2025-01-02    101       199      accessory         paid
4    2025-01-03    103       499      tablet            paid
5    2025-01-04    101       0        phone             refunded

1. 最基础:按单个字段分组 + 聚合

-- 按用户统计订单金额和订单数
SELECT 
    user_id,
    COUNT(*) AS order_count,
    SUM(amount) AS total_amount,
    AVG(amount) AS avg_amount,
    MAX(amount) AS max_amount
FROM orders
GROUP BY user_id;

结果:

user_id | order_count | total_amount | avg_amount | max_amount
--------|-------------|--------------|------------|-----------
101     | 3           | 498          | 166.00     | 299
102     | 1           | 1299         | 1299.00    | 1299
103     | 1           | 499          | 499.00     | 499

2. 多个字段分组

-- 按用户 + 分类统计
SELECT 
    user_id,
    product_category,
    COUNT(*) AS cnt,
    SUM(amount) AS total
FROM orders
GROUP BY user_id, product_category;

3. 带 WHERE + HAVING

-- 先筛选已支付订单,再按用户分组,最后筛选总金额 > 1000 的用户
SELECT 
    user_id,
    SUM(amount) AS total_amount
FROM orders
WHERE status = 'paid'
GROUP BY user_id
HAVING total_amount > 1000;

关键记忆

  • WHERE:先过滤原始行,再分组
  • HAVING:分组后过滤聚合结果

三、进阶用法(面试 & 真实场景高频)

1. GROUP_CONCAT(分组内拼接)

-- 每个用户购买过的所有品类(去重)
SELECT 
    user_id,
    GROUP_CONCAT(DISTINCT product_category ORDER BY product_category) AS categories
FROM orders
GROUP BY user_id;

结果示例:

user_id | categories
--------|---------------------
101     | accessory,phone
102     | laptop

进阶:控制分隔符、长度

GROUP_CONCAT(DISTINCT product_category SEPARATOR ' | ' LIMIT 3)

2. WITH ROLLUP(小计 & 总计)

SELECT 
    IFNULL(user_id, '总计') AS user_id,
    COUNT(*) AS order_count,
    SUM(amount) AS total_amount
FROM orders
GROUP BY user_id WITH ROLLUP;

结果会多一行总计:

user_id | order_count | total_amount
--------|-------------|-------------
101     | 3           | 498
102     | 1           | 1299
103     | 1           | 499
总计    | 5           | 2296

3. 结合窗口函数(MySQL 8.0+)

-- 每个用户订单金额排名
SELECT 
    user_id,
    order_date,
    amount,
    RANK() OVER (PARTITION BY user_id ORDER BY amount DESC) AS rank_in_user
FROM orders;

4. 按时间分组(常见报表)

-- 按月统计销售额
SELECT 
    DATE_FORMAT(order_date, '%Y-%m') AS month,
    COUNT(*) AS order_count,
    SUM(amount) AS total_amount
FROM orders
GROUP BY month
ORDER BY month;

四、最容易踩的坑(面试常考)

  1. SELECT 里非聚合字段必须出现在 GROUP BY 中 错误:
   SELECT user_id, order_date, SUM(amount)
   FROM orders
   GROUP BY user_id;   -- 报错或随机值(MySQL 5.7 宽松模式)

正确:要么加到 GROUP BY,要么用聚合函数

  1. HAVING 与 WHERE 顺序不能混 HAVING 只能用在 GROUP BY 之后
  2. COUNT(*)、COUNT(1)、COUNT(字段) 的区别
  • COUNT(*):统计所有行(最常用)
  • COUNT(1):等价于 COUNT(*)
  • COUNT(字段):只统计字段非 NULL 的行
  1. GROUP BY 后排序 聚合后的排序要写在最后:
   GROUP BY user_id
   ORDER BY total_amount DESC;
  1. NULL 的处理 聚合函数会自动忽略 NULL,GROUP BY 时 NULL 是一类

五、实战模板(直接抄走用)

  1. 按某字段统计基本指标
SELECT 
    xxx AS group_key,
    COUNT(*) AS cnt,
    COUNT(DISTINCT user_id) AS uv,
    SUM(amount) AS total,
    AVG(amount) AS avg,
    MAX(amount) AS max_value,
    MIN(amount) AS min_value
FROM table
WHERE ...
GROUP BY xxx
HAVING cnt > 10
ORDER BY total DESC
LIMIT 100;
  1. 带时间分组(日报/月报)
SELECT 
    DATE_FORMAT(created_at, '%Y-%m-%d') AS date,
    COUNT(*) AS pv,
    COUNT(DISTINCT user_id) AS uv,
    SUM(pay_amount) AS gmv
FROM orders
WHERE status = 'paid'
GROUP BY date
ORDER BY date DESC;
  1. Top N 统计(每个分类 Top 3 商品)
SELECT *
FROM (
    SELECT 
        category,
        product_name,
        SUM(sales) AS total_sales,
        ROW_NUMBER() OVER (PARTITION BY category ORDER BY SUM(sales) DESC) AS rn
    FROM sales
    GROUP BY category, product_name
) t
WHERE rn <= 3;

总结:一句话口诀

聚合函数负责“算”,GROUP BY 负责“分”,HAVING 负责“筛”,ORDER BY 负责“排”

WHERE 先过滤原始数据,GROUP BY 再分组算,HAVING 筛分组结果,最后排序输出。

你现在是想练习某个具体场景的 SQL(比如销售额月环比、用户留存、商品 Top N、漏斗分析等),还是想看某个复杂 GROUP BY 的完整案例?
直接告诉我,我可以继续给你写更贴近你需求的 SQL。

文章已创建 4357

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

相关文章

开始在上面输入您的搜索词,然后按回车进行搜索。按ESC取消。

返回顶部