大语言模型实战从零到一：搭建基于 MCP 的 RAG 系统完整教程

引言

Retrieval-Augmented Generation (RAG) 是大语言模型 (LLM) 的一种高级应用模式，通过检索外部知识库来增强模型的生成能力，避免幻觉并提供更准确、基于事实的响应。Model Context Protocol (MCP) 是一种标准化协议，用于将工具和资源暴露给 LLM，使其能够安全地访问外部数据和服务。在本教程中，我们将从零开始搭建一个基于 MCP 的 RAG 系统。该系统允许 LLM（如 Claude、Cursor 或其他支持 MCP 的模型）通过 MCP 服务器查询本地文档，实现 Agentic RAG（代理式 RAG），即代理智能决策何时检索数据。

本教程基于 Python 环境，使用 FastMCP 库构建 MCP 服务器，并集成 LangChain 实现 RAG 管道。我们将使用本地 Ollama 模型作为 LLM 和嵌入模型，便于本地运行。教程分为两个部分：基础 RAG MCP 服务器搭建（基于本地文本文件），以及扩展到数据库资源的版本（使用 SQLite）。这适用于初学者和开发者，假设你有基本的 Python 知识。

如果您使用 JVM 环境（如 Embabel 框架），MCP 可以无缝集成到代理动作中，但本教程聚焦通用 Python 实现。

先决条件

硬件/软件：一台支持 Python 的电脑（推荐 Python 3.10+）。安装 Ollama（下载自 https://ollama.com/）并运行本地模型。
Ollama 模型：拉取 nomic-embed-text:latest（用于嵌入）和 qwen2.5（用于 LLM）。命令：

  ollama pull nomic-embed-text:latest
  ollama pull qwen2.5
  ollama run qwen2.5  # 启动服务器，默认端口 11434

其他工具：Git（可选），SQLite（如果使用数据库版本）。
环境：创建一个虚拟环境以隔离依赖。

部分 1: 基础 RAG MCP 服务器（基于本地文本文件）

这个版本使用本地 TXT 文件作为知识库，构建 RAG 管道，并通过 MCP 暴露为工具。适合简单文档查询。

步骤 1: 设置工作环境

安装 uv（一个快速的 Python 项目管理工具，如果没有 pip）：

   pip install uv

初始化项目：

   uv init rag-mcp
   cd rag-mcp

安装依赖包：

   uv add 'mcp[cli]' langchain langchain-community langchain-ollama chromadb

mcp[cli]：FastMCP 库，用于 MCP 服务器。
langchain：构建 RAG 链。
chromadb：轻量级向量数据库，用于存储嵌入。

创建示例知识库文件：在项目根目录下创建 dummy.txt、dummy2.txt 和 dummy3.txt，填充一些测试内容。例如：

dummy.txt：关于 “Zackerkaky” 的虚构描述。
dummy2.txt：关于 “Construction Man” 的内容。
dummy3.txt：历史相关信息。

步骤 2: 构建 MCP 服务器脚本

创建 server.py 文件：

# server.py
from mcp.server.fastmcp import FastMCP
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain_ollama.llms import OllamaLLM
from langchain_ollama import OllamaEmbeddings

# 创建 MCP 服务器
mcp = FastMCP("RAG MCP Server")
embeddings = OllamaEmbeddings(
    model="nomic-embed-text:latest",
    base_url="http://127.0.0.1:11434"
)
model = OllamaLLM(model="qwen2.5", base_url="http://127.0.0.1:11434")

# RAG 链 1：加载 dummy.txt
loader = TextLoader("dummy.txt")
data = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(data)
docsearch = Chroma.from_documents(texts, embeddings)
qa1 = RetrievalQA.from_chain_type(llm=model, retriever=docsearch.as_retriever())

# RAG 链 2：加载 dummy2.txt
loader = TextLoader("dummy2.txt")
data = loader.load()
texts = text_splitter.split_documents(data)
docsearch = Chroma.from_documents(texts, embeddings)
qa2 = RetrievalQA.from_chain_type(llm=model, retriever=docsearch.as_retriever())

# RAG 链 3：加载 dummy3.txt
loader = TextLoader("dummy3.txt")
data = loader.load()
texts = text_splitter.split_documents(data)
docsearch = Chroma.from_documents(texts, embeddings)
qa3 = RetrievalQA.from_chain_type(llm=model, retriever=docsearch.as_retriever())

# 定义 MCP 工具
@mcp.tool()
def retrieve_info_zack(prompt: str) -> str:
    """获取关于 Zackerkaky 的信息"""
    return qa1.run(prompt)

@mcp.tool()
def retrieve_info_construction(prompt: str) -> str:
    """获取关于 Construction Man 的信息"""
    return qa2.run(prompt)

@mcp.tool()
def retrieve_info_history(prompt: str) -> str:
    """获取历史相关信息"""
    return qa3.run(prompt)

if __name__ == "__main__":
    mcp.run()

解释：

嵌入和 LLM：使用 Ollama 的本地模型生成向量嵌入和查询响应。
RAG 链：每个文件创建一个独立的 RetrievalQA 链：加载文档 → 分块 → 嵌入存储到 Chroma → 使用 LLM 生成响应。
MCP 工具：使用 @mcp.tool() 装饰器暴露 RAG 查询为工具。工具接受提示，返回 RAG 生成的结果。
运行：mcp.run() 启动 MCP 服务器，使用 JSON-RPC 协议监听。

步骤 3: 配置 MCP 客户端（如 Claude 或 Cursor）

在客户端（如 Claude 的 MCP 配置 JSON）中添加服务器：

{
  "mcpServers": {
    "RAG MCP Server": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/rag-mcp", "server.py"]
    }
  }
}

重启客户端，工具（如 retrieve_info_zack）将可用。
测试：在客户端输入如 “Use retrieve_info_zack to tell me about Zackerkaky.”，LLM 将调用工具并生成响应。

步骤 4: 测试系统

运行服务器：uv run server.py。
在客户端查询工具，确保返回基于文档的准确响应。
扩展：添加更多文件或支持 PDF（使用 PyPDFLoader）。

部分 2: 扩展版 – 使用数据库资源的 MCP RAG 系统

这个版本使用 SQLite 数据库作为知识库，通过 MCP 暴露“资源”（读数据）和“工具”（写数据），适合结构化数据 RAG。

步骤 1: 设置 SQLite 数据库

创建 create_db.py：

import sqlite3
import os

os.makedirs('db', exist_ok=True)
conn = sqlite3.connect('db/employees.db')
cursor = conn.cursor()

cursor.execute('''
CREATE TABLE IF NOT EXISTS employees (
    id INTEGER PRIMARY KEY,
    first_name TEXT NOT NULL,
    last_name TEXT NOT NULL,
    email TEXT UNIQUE,
    department TEXT,
    salary REAL,
    hire_date TEXT
)
''')

employees = [
    (1, 'John', 'Doe', 'john.doe@company.com', 'Engineering', 85000, '2020-01-15'),
    # ... (添加剩余 9 条记录，如教程中所示)
]

cursor.executemany('''
INSERT OR REPLACE INTO employees VALUES (?, ?, ?, ?, ?, ?, ?)
''', employees)

conn.commit()
conn.close()
print("Database created successfully!")

运行：python create_db.py。

步骤 2: 安装依赖

pip install fastmcp aiosqlite

步骤 3: 构建 MCP 服务器

创建 server.py：

import aiosqlite
from fastmcp import FastMCP
from typing import List, Dict, Optional

mcp = FastMCP("Employee MCP Server")
DB_PATH = "db/employees.db"

@mcp.resource("employees://all")
async def get_all_employees() -> List[Dict]:
    """返回所有员工记录"""
    async with aiosqlite.connect(DB_PATH) as conn:
        cursor = await conn.execute('SELECT * FROM employees')
        columns = [column[0] for column in cursor.description]
        employees = [dict(zip(columns, row)) async for row in cursor]
        await cursor.close()
    return employees

@mcp.resource("employees://{employee_id}")
async def get_employee_by_id(employee_id: int) -> Optional[Dict]:
    """根据 ID 返回员工记录"""
    async with aiosqlite.connect(DB_PATH) as conn:
        cursor = await conn.execute('SELECT * FROM employees WHERE id = ?', (employee_id,))
        row = await cursor.fetchone()
        if row:
            columns = [column[0] for column in cursor.description]
            result = dict(zip(columns, row))
        else:
            result = None
        await cursor.close()
    return result

@mcp.tool()
async def delete_employee(employee_id: int) -> bool:
    """根据 ID 删除员工记录"""
    async with aiosqlite.connect(DB_PATH) as conn:
        try:
            cursor = await conn.execute('DELETE FROM employees WHERE id = ?', (employee_id,))
            await conn.commit()
            success = cursor.rowcount > 0
            await cursor.close()
        except Exception:
            success = False
    return success

if __name__ == "__main__":
    mcp.run(transport="stdio")

解释：

资源：使用 @mcp.resource() 暴露数据 URI，如 employees://all（检索所有）或 employees://{id}（动态 ID）。
工具：@mcp.tool() 用于修改操作。
异步：使用 aiosqlite 确保高效 I/O。

步骤 4: 构建并测试客户端

创建 client.py：

import asyncio
from fastmcp import Client

async def main():
    async with Client("server.py") as client:
        print("资源列表:")
        resources = await client.list_resources()
        print(resources)

        print("\n工具列表:")
        tools = await client.list_tools()
        print(tools)

if __name__ == "__main__":
    asyncio.run(main())

运行：python client.py，检查输出。

在 LLM 中调用资源（如 employees://1）来检索数据，并生成响应。

高级技巧与优化

Agentic RAG：在代理框架（如 Embabel 或 LangGraph）中，让代理决定何时调用 MCP 工具/资源。
安全性：使用 MCP 的护栏限制访问；添加认证。
扩展：集成向量搜索（Hybrid RAG），或支持多模态（图像/视频）。
常见问题：如果 Ollama 未运行，确保端口 11434 可用；调试时检查日志。
性能：对于大文档，使用更先进的分割器（如 RecursiveCharacterTextSplitter）；监控 MCP 调用延迟。

结论

通过这个教程，您已从零搭建了一个基于 MCP 的 RAG 系统，能够处理本地文件或数据库查询。MCP 使 RAG 更标准化和可扩展，适用于企业级应用。实验不同工具，扩展到您的具体场景。如果需要 JVM 版本（Embabel 集成），参考其文档添加 MCP 服务器作为代理动作。

大语言模型实战从零到一：搭建基于 MCP 的 RAG 系统完整教程

大语言模型实战从零到一：搭建基于 MCP 的 RAG 系统完整教程

引言

先决条件

部分 1: 基础 RAG MCP 服务器（基于本地文本文件）

步骤 1: 设置工作环境

步骤 2: 构建 MCP 服务器脚本

步骤 3: 配置 MCP 客户端（如 Claude 或 Cursor）

步骤 4: 测试系统

部分 2: 扩展版 – 使用数据库资源的 MCP RAG 系统

步骤 1: 设置 SQLite 数据库

步骤 2: 安装依赖

步骤 3: 构建 MCP 服务器

步骤 4: 构建并测试客户端

高级技巧与优化

结论

likuolei

发表回复取消回复

归档

分类

2026 年 2 月
一	二	三	四	五	六	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

大语言模型实战从零到一：搭建基于 MCP 的 RAG 系统完整教程

引言

先决条件

部分 1: 基础 RAG MCP 服务器（基于本地文本文件）

步骤 1: 设置工作环境

步骤 2: 构建 MCP 服务器脚本

步骤 3: 配置 MCP 客户端（如 Claude 或 Cursor）

步骤 4: 测试系统

部分 2: 扩展版 – 使用数据库资源的 MCP RAG 系统

步骤 1: 设置 SQLite 数据库

步骤 2: 安装依赖

步骤 3: 构建 MCP 服务器

步骤 4: 构建并测试客户端

高级技巧与优化

结论

likuolei

发表回复 取消回复

相关文章

发表回复取消回复