PostgreSQL 子查询
下面是一份 PostgreSQL 子查询(Subquery) 的 完整实战手册,涵盖 所有类型、语法、性能优化、常见陷阱、与 CTE/窗口函数对比、最佳实践与面试题,适合开发、DBA、架构师使用。
一、什么是子查询?
子查询是嵌套在另一个查询中的
SELECT语句,也称为 内查询 或 嵌套查询。
子查询出现位置
| 位置 | 说明 | 
|---|---|
FROM | 作为数据源(派生表) | 
WHERE | 作为过滤条件 | 
SELECT | 标量子查询(返回单值) | 
HAVING | 聚合过滤 | 
INSERT/UPDATE/DELETE | 数据来源或条件 | 
二、子查询分类
| 类型 | 说明 | 示例 | 
|---|---|---|
| 标量子查询 | 返回 1行1列 | SELECT name FROM users WHERE id = (SELECT user_id FROM orders WHERE id = 1); | 
| 行子查询 | 返回 1行多列 | WHERE (col1, col2) = (SELECT c1, c2 FROM ...) | 
| 列子查询 | 返回 多行1列 | WHERE id IN (SELECT user_id FROM orders) | 
| 表子查询 | 返回 多行多列 | FROM (SELECT ...) AS t | 
| 相关子查询 | 引用外层列 | EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id) | 
| 非相关子查询 | 独立执行 | WHERE score > (SELECT AVG(score) FROM scores) | 
三、子查询语法详解
1. 标量子查询(Scalar Subquery)
-- 查询订单金额最高的商品
SELECT name, price
FROM products
WHERE price = (
    SELECT MAX(total_amount)
    FROM orders o
    JOIN order_items oi ON o.id = oi.order_id
    WHERE oi.product_id = products.id
);
必须返回 0或1行,否则报错
2. 列子查询 + IN / ANY / ALL
-- IN:等于任意一个
SELECT * FROM users WHERE id IN (SELECT user_id FROM orders WHERE status = 'paid');
-- ANY:满足任意条件
SELECT * FROM products WHERE price > ANY (SELECT price FROM discounts);
-- ALL:满足全部条件
SELECT * FROM products WHERE price > ALL (SELECT price FROM competitor_products);
3. 行子查询
-- 多列匹配
SELECT * FROM employees
WHERE (salary, department_id) = (
    SELECT MAX(salary), department_id
    FROM employees
    GROUP BY department_id
    HAVING COUNT(*) > 1
);
4. 表子查询(Derived Table)
SELECT u.email, o.total
FROM (
    SELECT user_id, SUM(amount) AS total
    FROM orders
    GROUP BY user_id
    HAVING SUM(amount) > 1000
) o
JOIN users u ON o.user_id = u.id;
四、EXISTS vs IN vs JOIN 性能对比
| 方式 | 性能 | 适用场景 | 
|---|---|---|
EXISTS | 最快(短路) | 判断存在 | 
IN | 中等(物化子查询) | 小结果集 | 
LEFT JOIN + IS NOT NULL | 快(优化器友好) | 大表关联 | 
推荐写法:EXISTS
-- 推荐:有订单的用户
SELECT u.*
FROM users u
WHERE EXISTS (
    SELECT 1 FROM orders o WHERE o.user_id = u.id
);
-- 避免:IN + 大表
-- WHERE u.id IN (SELECT user_id FROM orders)  -- 可能慢
五、相关子查询(Correlated Subquery)
子查询引用外层表的列,每行外层执行一次子查询。
-- 每个用户最近一次登录
SELECT u.name, u.email,
       (SELECT MAX(login_time)
        FROM login_logs l
        WHERE l.user_id = u.id) AS last_login
FROM users u;
性能差?→ 用
LATERAL JOIN
六、LATERAL 子查询(PostgreSQL 特色)
允许子查询引用左侧表的列,按行执行,类似循环。
-- 每个用户最近 3 条订单(高效!)
SELECT u.name, o.id, o.created_at
FROM users u
CROSS JOIN LATERAL (
    SELECT id, created_at
    FROM orders
    WHERE user_id = u.id
    ORDER BY created_at DESC
    LIMIT 3
) o;
等价于:
-- 传统写法(慢)
SELECT u.name,
       (SELECT json_agg(row_to_json(t))
        FROM (
            SELECT id, created_at
            FROM orders
            WHERE user_id = u.id
            ORDER BY created_at DESC
            LIMIT 3
        ) t) AS recent_orders
FROM users u;
七、子查询在 DML 中的使用
1. INSERT ... SELECT
INSERT INTO user_summary (user_id, order_count)
SELECT user_id, COUNT(*)
FROM orders
GROUP BY user_id;
2. UPDATE ... FROM
UPDATE products p
SET last_order_date = o.max_date
FROM (
    SELECT product_id, MAX(created_at) AS max_date
    FROM order_items oi
    JOIN orders o ON oi.order_id = o.id
    GROUP BY product_id
) o
WHERE p.id = o.product_id;
3. DELETE USING
DELETE FROM users u
USING (
    SELECT user_id FROM inactive_users
) i
WHERE u.id = i.user_id;
八、子查询 vs CTE vs 窗口函数
| 场景 | 推荐方式 | 
|---|---|
| 临时中间结果 | CTE(可读性高) | 
| 按行取 Top N | LATERAL 或 窗口函数 | 
| 复杂逻辑分步 | CTE | 
| 性能敏感 | 避免深层嵌套 | 
-- CTE 更清晰
WITH order_stats AS (
    SELECT user_id, COUNT(*) AS cnt, SUM(amount) AS total
    FROM orders
    GROUP BY user_id
)
SELECT u.*, os.cnt, os.total
FROM users u
JOIN order_stats os ON u.id = os.user_id;
九、性能优化技巧
| 技巧 | 说明 | 
|---|---|
避免 SELECT * 在子查询 | 只选必要列 | 
使用 EXISTS 替代 IN | 短路 + 索引友好 | 
相关子查询 → JOIN | 优化器更聪明 | 
标量子查询加 LIMIT 1 | 防多行错误 | 
| 建索引支持子查询 | 如 (user_id) | 
-- 强制 LIMIT 1
WHERE price = (SELECT price FROM discounts WHERE product_id = p.id LIMIT 1)
十、常见错误与避坑
| 错误 | 原因 | 解决 | 
|---|---|---|
more than one row returned by a subquery | 标量子查询返回多行 | 加 LIMIT 1 或用 IN | 
| 性能极差 | 相关子查询每行执行 | 改 JOIN 或 LATERAL | 
IN 返回 NULL | IN (NULL, 1, 2) → 永远 false | 用 COALESCE 或 EXISTS | 
| 子查询未使用索引 | 缺少相关列索引 | 建 (user_id, created_at) | 
十一、最佳实践脚本
1. 用户活跃度排行(Top 3 订单)
SELECT 
    u.name,
    u.email,
    COALESCE(o.order_count, 0) AS order_count
FROM users u
LEFT JOIN LATERAL (
    SELECT COUNT(*) AS order_count
    FROM orders
    WHERE user_id = u.id
      AND created_at >= CURRENT_DATE - INTERVAL '30 days'
) o ON TRUE
ORDER BY o.order_count DESC
LIMIT 10;
2. 清理无订单用户
DELETE FROM users
WHERE NOT EXISTS (
    SELECT 1 FROM orders WHERE user_id = users.id
);
3. 动态分区清理(子查询生成表名)
DO $$
DECLARE
    tbl TEXT;
BEGIN
    FOR tbl IN
        SELECT tablename
        FROM pg_tables
        WHERE tablename LIKE 'logs_2023%'
          AND tablename < 'logs_' || to_char(CURRENT_DATE - INTERVAL '90 days', 'YYYYMM')
    LOOP
        EXECUTE 'DROP TABLE IF EXISTS ' || quote_ident(tbl);
    END LOOP;
END $$;
十二、系统表查看执行计划
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE amount > 1000);
期望看到:
Index Scan using orders_pkeyNested Loop
十三、速查表
| 语法 | 用途 | 
|---|---|
(SELECT ...) | 标量子查询 | 
IN (SELECT ...) | 列子查询 | 
EXISTS (SELECT 1 ...) | 存在性检查 | 
FROM (SELECT ...) t | 表子查询 | 
LATERAL (SELECT ...) | 相关按行查询 | 
ANY / ALL | 比较操作符 | 
十四、子查询决策树
graph TD
    A[需要嵌套查询?] -->|是| B{返回单值?}
    B -->|是| C[用 (SELECT ...)]
    B -->|否| D{判断存在?}
    D -->|是| E[用 EXISTS]
    D -->|否| F{按行取多行?}
    F -->|是| G[用 LATERAL]
    F -->|否| H[用 JOIN 或 CTE]
十五、常见面试题
| 问题 | 答案 | 
|---|---|
IN 和 EXISTS 区别? | EXISTS 短路,性能更好 | 
| 相关子查询执行几次? | 外层每行一次 | 
| 如何避免标量子查询多行错误? | 加 LIMIT 1 | 
LATERAL 是什么? | 允许子查询引用左侧列 | 
| 子查询一定慢吗? | 不一定,优化器会重写 | 
需要我提供一个完整的 “复杂报表 + 子查询 + CTE + 窗口函数对比” 的性能测试项目吗?