如何避免HTTP错误429(请求太多)Python

如何彻底避免 HTTP 429 Too Many Requests(Python 实战版·2025最新)

429 是爬虫/接口调用中最常见的“死亡宣告”,意思是:你太快了,被服务器限流了
下面给你一套 从入门到生产级 的完整防429方案,99.9%的场景都能搞定!

一、429 的常见触发原因(先知道敌人是谁)

触发场景典型阈值(大致)代表网站
每秒请求数(RPS)5~50微博、知乎、豆瓣
每分钟请求数60~1000百度、京东、淘宝
短时间内相同IP请求100次/分钟绝大多数网站
没有 User-Agent立刻429或直接封IP几乎所有现代网站
缺少必要 Cookie触发反爬抖音、B站、微信公众号
频率突然暴增触发风控所有大厂

二、Python 防429 终极解决方案(8大招式,任选组合)

等级方法代码示例(直接复制)适用场景
★☆☆1. 基础 sleep“`python
import time
time.sleep(1) # 每请求一次睡1秒,最简单粗暴
“`小网站、学习用
★★☆2. 随机 sleep(推荐)“`python
import time, random
time.sleep(random.uniform(0.5, 2.5)) # 0.5~2.5秒随机
“`90%爬虫都能过
★★☆3. requests + retry(优雅)“`python
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

s = requests.Session()
retries = Retry(total=5, backoff_factor=1, status_forcelist=[429, 500, 502, 503, 504])
s.mount(‘http://’, HTTPAdapter(max_retries=retries))
s.mount(‘https://’, HTTPAdapter(max_retries=retries))
| 接口调用、轻量爬虫 | | ★★★ | 4. 限制并发(多线程/异步必备) | **线程池**<br>python
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=5) as pool: # 最多5个并发
pool.map(crawl, urls)
<br>**异步**<br>python
import asyncio, aiohttp
sem = asyncio.Semaphore(10) # 最多10个并发
async def fetch(session, url):
async with sem:
async with session.get(url) as resp:
return await resp.text()
| 大规模爬取 | | ★★★ | 5. 动态调整频率(智能) |python
import time
class SmartSleep:
def init(self):
self.last_time = 0
self.min_interval = 0.2
def sleep(self):
now = time.time()
diff = now – self.last_time
if diff < self.min_interval:
time.sleep(self.min_interval – diff)
self.last_time = time.time()
ss = SmartSleep()

用法:每次请求后 ss.sleep()

| 高强度爬虫 | | ★★★ | 6. 分布式 + IP代理池(终极) |python
import random
proxies = [
“http://1.2.3.4:8888”,
“http://5.6.7.8:9999”,
# 1000个代理…
]
proxy = random.choice(proxies)
requests.get(url, proxies={“http”: proxy, “https”: proxy})
| 封不死的爬虫(收费代理推荐:芝麻、讯代理、瞬火) | | ★★☆ | 7. 伪装完美请求头 |python
headers = {
“User-Agent”: “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36”,
“Accept”: “text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8″,
“Accept-Language”: “zh-CN,zh;q=0.9,en;q=0.8”,
“Accept-Encoding”: “gzip, deflate, br”,
“Connection”: “keep-alive”,
“Upgrade-Insecure-Requests”: “1”,
}
| 所有网站必备 | | ★★☆ | 8. 尊重 robots.txt + 礼貌爬取 |python
from urllib.robotparser import RobotFileParser
rp = RobotFileParser()
rp.set_url(“https://xxx.com/robots.txt”)
rp.read()
if rp.can_fetch(“*”, url):
# 允许才爬
“` | 良心爬虫、避免被拉黑 |

三、最强组合(生产级防429方案)

import requests
import time
import random
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

class Anti429Session:
    def __init__(self):
        self.session = requests.Session()
        # 自动重试429
        retry = Retry(total=10, backoff_factor=2, status_forcelist=[429, 500, 502, 503, 504])
        adapter = HTTPAdapter(max_retries=retry)
        self.session.mount("http://", adapter)
        self.session.mount("https://", adapter)

        # 随机头
        self.session.headers.update({
            "User-Agent": random.choice([
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/129",
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36",
                # 多加几个
            ])
        })

    def get(self, url, **kwargs):
        time.sleep(random.uniform(0.3, 1.2))  # 核心:随机延迟
        try:
            resp = self.session.get(url, timeout=10, **kwargs)
            if resp.status_code == 429:
                wait = int(resp.headers.get("Retry-After", 10))
                print(f"429触发,等待{wait}秒")
                time.sleep(wait)
                return self.get(url, **kwargs)  # 再试一次
            return resp
        except Exception as e:
            time.sleep(5)
            return self.get(url, **kwargs)

# 使用
s = Anti429Session()
r = s.get("https://httpbin.org/status/429")

四、2025年终极建议(一句话记住)

“慢即是快,伪装即是生存”

  • 永远加随机 sleep
  • 永远用真实浏览器头
  • 永远控制并发 < 10
  • 永远准备代理池备用
  • 永远尊重 Retry-After 头

遵守这五条,你这辈子都不会再被429折磨!

需要我给你一个完整可运行的防429爬虫框架(支持异步+代理池+自动重试+失败重爬),直接说一声,我5分钟发你!

文章已创建 3096

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

相关文章

开始在上面输入您的搜索词,然后按回车进行搜索。按ESC取消。

返回顶部