PostgreSQL 子查询

下面是一份 PostgreSQL 子查询(Subquery)完整实战手册,涵盖 所有类型、语法、性能优化、常见陷阱、与 CTE/窗口函数对比、最佳实践与面试题,适合开发、DBA、架构师使用。


一、什么是子查询?

子查询是嵌套在另一个查询中的 SELECT 语句,也称为 内查询嵌套查询

子查询出现位置

位置说明
FROM作为数据源(派生表
WHERE作为过滤条件
SELECT标量子查询(返回单值)
HAVING聚合过滤
INSERT/UPDATE/DELETE数据来源或条件

二、子查询分类

类型说明示例
标量子查询返回 1行1列SELECT name FROM users WHERE id = (SELECT user_id FROM orders WHERE id = 1);
行子查询返回 1行多列WHERE (col1, col2) = (SELECT c1, c2 FROM ...)
列子查询返回 多行1列WHERE id IN (SELECT user_id FROM orders)
表子查询返回 多行多列FROM (SELECT ...) AS t
相关子查询引用外层列EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id)
非相关子查询独立执行WHERE score > (SELECT AVG(score) FROM scores)

三、子查询语法详解

1. 标量子查询(Scalar Subquery)

-- 查询订单金额最高的商品
SELECT name, price
FROM products
WHERE price = (
    SELECT MAX(total_amount)
    FROM orders o
    JOIN order_items oi ON o.id = oi.order_id
    WHERE oi.product_id = products.id
);

必须返回 0或1行,否则报错

2. 列子查询 + IN / ANY / ALL

-- IN:等于任意一个
SELECT * FROM users WHERE id IN (SELECT user_id FROM orders WHERE status = 'paid');

-- ANY:满足任意条件
SELECT * FROM products WHERE price > ANY (SELECT price FROM discounts);

-- ALL:满足全部条件
SELECT * FROM products WHERE price > ALL (SELECT price FROM competitor_products);

3. 行子查询

-- 多列匹配
SELECT * FROM employees
WHERE (salary, department_id) = (
    SELECT MAX(salary), department_id
    FROM employees
    GROUP BY department_id
    HAVING COUNT(*) > 1
);

4. 表子查询(Derived Table)

SELECT u.email, o.total
FROM (
    SELECT user_id, SUM(amount) AS total
    FROM orders
    GROUP BY user_id
    HAVING SUM(amount) > 1000
) o
JOIN users u ON o.user_id = u.id;

四、EXISTS vs IN vs JOIN 性能对比

方式性能适用场景
EXISTS最快(短路)判断存在
IN中等(物化子查询)小结果集
LEFT JOIN + IS NOT NULL快(优化器友好)大表关联

推荐写法:EXISTS

-- 推荐:有订单的用户
SELECT u.*
FROM users u
WHERE EXISTS (
    SELECT 1 FROM orders o WHERE o.user_id = u.id
);

-- 避免:IN + 大表
-- WHERE u.id IN (SELECT user_id FROM orders)  -- 可能慢

五、相关子查询(Correlated Subquery)

子查询引用外层表的列,每行外层执行一次子查询。

-- 每个用户最近一次登录
SELECT u.name, u.email,
       (SELECT MAX(login_time)
        FROM login_logs l
        WHERE l.user_id = u.id) AS last_login
FROM users u;

性能差?→ 用 LATERAL JOIN


六、LATERAL 子查询(PostgreSQL 特色)

允许子查询引用左侧表的列,按行执行,类似循环。

-- 每个用户最近 3 条订单(高效!)
SELECT u.name, o.id, o.created_at
FROM users u
CROSS JOIN LATERAL (
    SELECT id, created_at
    FROM orders
    WHERE user_id = u.id
    ORDER BY created_at DESC
    LIMIT 3
) o;

等价于:

-- 传统写法(慢)
SELECT u.name,
       (SELECT json_agg(row_to_json(t))
        FROM (
            SELECT id, created_at
            FROM orders
            WHERE user_id = u.id
            ORDER BY created_at DESC
            LIMIT 3
        ) t) AS recent_orders
FROM users u;

七、子查询在 DML 中的使用

1. INSERT ... SELECT

INSERT INTO user_summary (user_id, order_count)
SELECT user_id, COUNT(*)
FROM orders
GROUP BY user_id;

2. UPDATE ... FROM

UPDATE products p
SET last_order_date = o.max_date
FROM (
    SELECT product_id, MAX(created_at) AS max_date
    FROM order_items oi
    JOIN orders o ON oi.order_id = o.id
    GROUP BY product_id
) o
WHERE p.id = o.product_id;

3. DELETE USING

DELETE FROM users u
USING (
    SELECT user_id FROM inactive_users
) i
WHERE u.id = i.user_id;

八、子查询 vs CTE vs 窗口函数

场景推荐方式
临时中间结果CTE(可读性高)
按行取 Top NLATERAL 或 窗口函数
复杂逻辑分步CTE
性能敏感避免深层嵌套
-- CTE 更清晰
WITH order_stats AS (
    SELECT user_id, COUNT(*) AS cnt, SUM(amount) AS total
    FROM orders
    GROUP BY user_id
)
SELECT u.*, os.cnt, os.total
FROM users u
JOIN order_stats os ON u.id = os.user_id;

九、性能优化技巧

技巧说明
避免 SELECT * 在子查询只选必要列
使用 EXISTS 替代 IN短路 + 索引友好
相关子查询 → JOIN优化器更聪明
标量子查询加 LIMIT 1防多行错误
建索引支持子查询(user_id)
-- 强制 LIMIT 1
WHERE price = (SELECT price FROM discounts WHERE product_id = p.id LIMIT 1)

十、常见错误与避坑

错误原因解决
more than one row returned by a subquery标量子查询返回多行LIMIT 1 或用 IN
性能极差相关子查询每行执行JOINLATERAL
IN 返回 NULLIN (NULL, 1, 2) → 永远 falseCOALESCEEXISTS
子查询未使用索引缺少相关列索引(user_id, created_at)

十一、最佳实践脚本

1. 用户活跃度排行(Top 3 订单)

SELECT 
    u.name,
    u.email,
    COALESCE(o.order_count, 0) AS order_count
FROM users u
LEFT JOIN LATERAL (
    SELECT COUNT(*) AS order_count
    FROM orders
    WHERE user_id = u.id
      AND created_at >= CURRENT_DATE - INTERVAL '30 days'
) o ON TRUE
ORDER BY o.order_count DESC
LIMIT 10;

2. 清理无订单用户

DELETE FROM users
WHERE NOT EXISTS (
    SELECT 1 FROM orders WHERE user_id = users.id
);

3. 动态分区清理(子查询生成表名)

DO $$
DECLARE
    tbl TEXT;
BEGIN
    FOR tbl IN
        SELECT tablename
        FROM pg_tables
        WHERE tablename LIKE 'logs_2023%'
          AND tablename < 'logs_' || to_char(CURRENT_DATE - INTERVAL '90 days', 'YYYYMM')
    LOOP
        EXECUTE 'DROP TABLE IF EXISTS ' || quote_ident(tbl);
    END LOOP;
END $$;

十二、系统表查看执行计划

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE amount > 1000);

期望看到:

  • Index Scan using orders_pkey
  • Nested Loop

十三、速查表

语法用途
(SELECT ...)标量子查询
IN (SELECT ...)列子查询
EXISTS (SELECT 1 ...)存在性检查
FROM (SELECT ...) t表子查询
LATERAL (SELECT ...)相关按行查询
ANY / ALL比较操作符

十四、子查询决策树

graph TD
    A[需要嵌套查询?] -->|是| B{返回单值?}
    B -->|是| C[用 (SELECT ...)]
    B -->|否| D{判断存在?}
    D -->|是| E[用 EXISTS]
    D -->|否| F{按行取多行?}
    F -->|是| G[用 LATERAL]
    F -->|否| H[用 JOIN 或 CTE]

十五、常见面试题

问题答案
INEXISTS 区别?EXISTS 短路,性能更好
相关子查询执行几次?外层每行一次
如何避免标量子查询多行错误?LIMIT 1
LATERAL 是什么?允许子查询引用左侧列
子查询一定慢吗?不一定,优化器会重写

需要我提供一个完整的 “复杂报表 + 子查询 + CTE + 窗口函数对比” 的性能测试项目吗?

类似文章

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注