Playwright 数据提取和验证

likuolei2025年12月22日2025年12月22日

Playwright 数据提取和验证（2025 年最新版）

数据提取（Extraction）和验证（Assertion）是 Playwright 在自动化测试和爬虫场景中最核心的部分。Playwright 提供了强大且可靠的 Locator API 和 Web-First 断言，能自动等待元素就绪，确保测试稳定。下面以 Node.js/TypeScript（Playwright Test）为主，附 Python 示例。

1. 数据提取（常见方式）

提取目标	代码示例（推荐 Locator）	说明
单个元素文本	`const text = await page.getByRole('heading', { name: '欢迎' }).textContent();`	返回字符串（null 如果不存在）
输入框值	`const value = await page.getByLabel('用户名').inputValue();`	适用于 input/textarea
属性值	`const href = await page.getByRole('link', { name: '详情' }).getAttribute('href');`	获取 href、src、data-* 等
多个元素文本	`const items = await page.getByRole('listitem').allTextContents();`	返回 string[] 数组
多个元素内文本	`const texts = await page.getByTestId('price').allInnerTexts();`	innerText（不含子元素隐藏文本）
元素计数	`const count = await page.getByRole('article').count();`	返回元素数量
表格数据提取	“`const
JSON/API 数据	`const response = await page.waitForResponse('**/api/users');` `const json = await response.json();`	拦截网络响应提取数据

2. 验证（断言）——Playwright 最强大特性

Playwright 的 expect 是 Web-First 断言：会自动重试直到超时（默认 30s），极大减少 flaky 测试。

import { test, expect } from '@playwright/test';

// 页面标题
await expect(page).toHaveTitle('Playwright - 首页');
await expect(page).toHaveTitle(/Playwright/);  // 正则匹配

// URL
await expect(page).toHaveURL('https://playwright.dev/');
await expect(page).toHaveURL(/docs/);

// 元素可见/隐藏
await expect(page.getByText('加载成功')).toBeVisible();
await expect(page.getByText('加载中')).toBeHidden();

// 元素文本
await expect(page.getByRole('heading')).toHaveText('欢迎使用');
await expect(page.getByTestId('status')).toContainText('成功');  // 包含

// 元素属性
await expect(page.getByRole('link')).toHaveAttribute('href', '/docs');
await expect(page.getByRole('img')).toHaveAttribute('src', /logo/);

// 输入框值
await expect(page.getByLabel('搜索')).toHaveValue('Playwright');

// 元素数量
await expect(page.getByRole('listitem')).toHaveCount(5);

// Checkbox/Radio 状态
await expect(page.getByRole('checkbox')).toBeChecked();
await expect(page.getByRole('radio', { name: '男' })).toBeChecked();

// 元素启用/禁用
await expect(page.getByRole('button')).toBeEnabled();
await expect(page.getByRole('button')).toBeDisabled();

3. 高级验证技巧

// 软断言（不立即失败，继续执行）
expect.soft(page.getByText('错误提示')).toBeHidden();

// 自定义超时
await expect(page.getByText('加载完成'), { timeout: 10000 }).toBeVisible();

// 轮询断言（复杂条件）
await expect(async () => {
  const count = await page.getByRole('listitem').count();
  expect(count).toBeGreaterThan(0);
}).toPass({ timeout: 15000 });

// 截图断言（视觉回归）
await expect(page).toHaveScreenshot('homepage.png', { maxDiffPixels: 100 });

// 全页面截图比较（像素级）
await expect(page).toHaveScreenshot({ fullPage: true });

4. 实战示例：提取并验证搜索结果

test('百度搜索 Playwright 并验证结果', async ({ page }) => {
  await page.goto('https://www.baidu.com');
  await page.getByLabel('搜索输入框').fill('Playwright');
  await page.getByRole('button', { name: '百度一下' }).click();

  // 等待结果加载
  await page.waitForLoadState('networkidle');

  // 提取第一个结果标题
  const firstTitle = await page.getByRole('heading').first().textContent();
  console.log('第一个结果标题：', firstTitle);

  // 验证结果包含关键词
  await expect(page.getByRole('heading').first()).toContainText('Playwright');
  await expect(page.getByRole('heading')).toHaveCount(10);  // 通常一页 10 条

  // 提取所有结果链接
  const links = await page.getByRole('link', { name: /Playwright/ }).all();
  console.log(`找到 ${links.length} 个相关链接`);
});

5. Python 版提取与验证示例

from playwright.sync_api import sync_playwright, expect

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = browser.new_page()
    page.goto("https://playwright.dev")

    # 提取
    title = page.title()
    heading = page.get_by_role("heading", name="Fast and reliable").text_content()
    print(f"标题: {title}, 副标题: {heading}")

    # 验证
    expect(page).to_have_title("Playwright")
    expect(page.get_by_role("link", name="Get started")).to_be_visible()
    expect(page.get_by_text("Playwright is a")).to_contain_text("reliable")

    browser.close()

最佳实践总结

提取：优先用 getByRole + textContent() / inputValue()。
验证：全部使用 expect()，让 Playwright 自动重试。
测试专用属性：在被测应用中添加 data-testid="xxx"，最稳定。
调试：失败时自动生成 trace（截图 + 视频 + 网络日志），用 npx playwright show-trace 查看。

掌握这些，你就能轻松实现可靠的 E2E 测试和数据爬取！下一步建议：写一个完整的登录 + 列表页数据提取 + 验证的测试用例。

需要具体场景（如表格提取、分页加载、API 数据验证）的完整代码，随时告诉我！

likuolei

网站 http://yingjuxia.com

文章已创建 4391

发表回复取消回复

返回顶部