如何彻底避免 HTTP 429 Too Many Requests(Python 实战版·2025最新)
429 是爬虫/接口调用中最常见的“死亡宣告”,意思是:你太快了,被服务器限流了。
下面给你一套 从入门到生产级 的完整防429方案,99.9%的场景都能搞定!
一、429 的常见触发原因(先知道敌人是谁)
| 触发场景 | 典型阈值(大致) | 代表网站 |
|---|---|---|
| 每秒请求数(RPS) | 5~50 | 微博、知乎、豆瓣 |
| 每分钟请求数 | 60~1000 | 百度、京东、淘宝 |
| 短时间内相同IP请求 | 100次/分钟 | 绝大多数网站 |
| 没有 User-Agent | 立刻429或直接封IP | 几乎所有现代网站 |
| 缺少必要 Cookie | 触发反爬 | 抖音、B站、微信公众号 |
| 频率突然暴增 | 触发风控 | 所有大厂 |
二、Python 防429 终极解决方案(8大招式,任选组合)
| 等级 | 方法 | 代码示例(直接复制) | 适用场景 |
|---|---|---|---|
| ★☆☆ | 1. 基础 sleep | “`python | |
| import time | |||
| time.sleep(1) # 每请求一次睡1秒,最简单粗暴 | |||
| “` | 小网站、学习用 | ||
| ★★☆ | 2. 随机 sleep(推荐) | “`python | |
| import time, random | |||
| time.sleep(random.uniform(0.5, 2.5)) # 0.5~2.5秒随机 | |||
| “` | 90%爬虫都能过 | ||
| ★★☆ | 3. requests + retry(优雅) | “`python | |
| from requests.adapters import HTTPAdapter | |||
| from urllib3.util.retry import Retry |
s = requests.Session()
retries = Retry(total=5, backoff_factor=1, status_forcelist=[429, 500, 502, 503, 504])
s.mount(‘http://’, HTTPAdapter(max_retries=retries))
s.mount(‘https://’, HTTPAdapter(max_retries=retries))| 接口调用、轻量爬虫 | | ★★★ | 4. 限制并发(多线程/异步必备) | **线程池**<br>python
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=5) as pool: # 最多5个并发
pool.map(crawl, urls)<br>**异步**<br>python
import asyncio, aiohttp
sem = asyncio.Semaphore(10) # 最多10个并发
async def fetch(session, url):
async with sem:
async with session.get(url) as resp:
return await resp.text()| 大规模爬取 | | ★★★ | 5. 动态调整频率(智能) |python
import time
class SmartSleep:
def init(self):
self.last_time = 0
self.min_interval = 0.2
def sleep(self):
now = time.time()
diff = now – self.last_time
if diff < self.min_interval:
time.sleep(self.min_interval – diff)
self.last_time = time.time()
ss = SmartSleep()
用法:每次请求后 ss.sleep()
| 高强度爬虫 | | ★★★ | 6. 分布式 + IP代理池(终极) |python
import random
proxies = [
“http://1.2.3.4:8888”,
“http://5.6.7.8:9999”,
# 1000个代理…
]
proxy = random.choice(proxies)
requests.get(url, proxies={“http”: proxy, “https”: proxy})| 封不死的爬虫(收费代理推荐:芝麻、讯代理、瞬火) | | ★★☆ | 7. 伪装完美请求头 |python
headers = {
“User-Agent”: “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36”,
“Accept”: “text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8″,
“Accept-Language”: “zh-CN,zh;q=0.9,en;q=0.8”,
“Accept-Encoding”: “gzip, deflate, br”,
“Connection”: “keep-alive”,
“Upgrade-Insecure-Requests”: “1”,
}| 所有网站必备 | | ★★☆ | 8. 尊重 robots.txt + 礼貌爬取 |python
from urllib.robotparser import RobotFileParser
rp = RobotFileParser()
rp.set_url(“https://xxx.com/robots.txt”)
rp.read()
if rp.can_fetch(“*”, url):
# 允许才爬
“` | 良心爬虫、避免被拉黑 |
三、最强组合(生产级防429方案)
import requests
import time
import random
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
class Anti429Session:
def __init__(self):
self.session = requests.Session()
# 自动重试429
retry = Retry(total=10, backoff_factor=2, status_forcelist=[429, 500, 502, 503, 504])
adapter = HTTPAdapter(max_retries=retry)
self.session.mount("http://", adapter)
self.session.mount("https://", adapter)
# 随机头
self.session.headers.update({
"User-Agent": random.choice([
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/129",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36",
# 多加几个
])
})
def get(self, url, **kwargs):
time.sleep(random.uniform(0.3, 1.2)) # 核心:随机延迟
try:
resp = self.session.get(url, timeout=10, **kwargs)
if resp.status_code == 429:
wait = int(resp.headers.get("Retry-After", 10))
print(f"429触发,等待{wait}秒")
time.sleep(wait)
return self.get(url, **kwargs) # 再试一次
return resp
except Exception as e:
time.sleep(5)
return self.get(url, **kwargs)
# 使用
s = Anti429Session()
r = s.get("https://httpbin.org/status/429")
四、2025年终极建议(一句话记住)
“慢即是快,伪装即是生存”
- 永远加随机 sleep
- 永远用真实浏览器头
- 永远控制并发 < 10
- 永远准备代理池备用
- 永远尊重 Retry-After 头
遵守这五条,你这辈子都不会再被429折磨!
需要我给你一个完整可运行的防429爬虫框架(支持异步+代理池+自动重试+失败重爬),直接说一声,我5分钟发你!