Ollama API 交互

与 Ollama 模型通过 API 交互是将其集成到应用程序、脚本或自动化任务中的核心方式。Ollama 提供了一个 RESTful API，默认运行在 http://localhost:11434，支持文本生成、对话、嵌入生成等多种功能。以下是 Ollama API 交互的详细指南，包括端点、参数、示例和注意事项。

1. API 概述

服务器启动：运行以下命令启动 Ollama 服务器：

  ollama serve

默认监听 http://localhost:11434。确保服务器运行，否则 API 调用会失败。

主要端点：
/api/generate：单次文本生成，适合一次性任务。
/api/chat：多轮对话，适合交互式聊天。
/api/embeddings：生成文本嵌入，适用于搜索或语义分析。
/api/ps：列出运行中的模型。
/api/pull、/api/push 等：管理模型。
特性：
支持流式响应（实时输出）。
兼容 OpenAI API 格式，便于与 LangChain、LlamaIndex 等框架集成。
支持多模态模型（如 Llava，处理图像）。

2. 主要 API 端点详解

以下是核心端点的用法、参数和示例。

2.1. `/api/generate` – 单次文本生成

用途：生成单次响应，适合任务如翻译、文本补全、代码生成。
请求方法：POST
请求格式：

  {
    "model": "<model-name>", // 模型名称，如 llama3
    "prompt": "<your-prompt>", // 用户输入
    "stream": true/false, // 是否流式输出（默认 true）
    "options": {
      "temperature": 0.7, // 创意度（0-1）
      "top_p": 0.9, // 核采样
      "num_predict": 128, // 最大 token 数
      "stop": ["\n"] // 停止词
    }
  }

示例（使用 curl）：

  curl http://localhost:11434/api/generate -d '{
    "model": "llama3",
    "prompt": "Write a short poem about the stars.",
    "stream": false
  }'

响应（非流式）：

  {
    "model": "llama3",
    "response": "Twinkling dreams in night's embrace,\nStars above light up the space.",
    "done": true
  }

流式响应（stream: true）：
每行返回一个 JSON 对象，包含部分输出：
json {"response": "Twinkling"} {"response": " dreams"} ... {"done": true}

2.2. `/api/chat` – 多轮对话

用途：支持上下文保持的对话，类似 ChatGPT。
请求方法：POST
请求格式：

  {
    "model": "<model-name>",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What's 2+2?"},
      {"role": "assistant", "content": "2+2 equals 4."},
      {"role": "user", "content": "Now multiply that by 3."}
    ],
    "stream": true/false
  }

示例：

  curl http://localhost:11434/api/chat -d '{
    "model": "mistral",
    "messages": [
      {"role": "user", "content": "Tell me a joke."}
    ],
    "stream": false
  }'

响应：

  {
    "model": "mistral",
    "message": {
      "role": "assistant",
      "content": "Why did the scarecrow become a programmer? Because he was outstanding in his field!"
    },
    "done": true
  }

2.3. `/api/embeddings` – 生成文本嵌入

用途：将文本转为向量，用于语义搜索、分类或 RAG（检索增强生成）。
请求方法：POST
请求格式：

  {
    "model": "<model-name>",
    "prompt": "<text>"
  }

示例：

  curl http://localhost:11434/api/embeddings -d '{
    "model": "llama3",
    "prompt": "Artificial intelligence"
  }'

响应：

  {
    "embedding": [0.123, -0.456, ...] // 向量数组
  }

2.4. `/api/ps` – 查看运行中的模型

用途：列出当前加载到内存的模型，检查资源占用。
请求方法：GET
示例：

  curl http://localhost:11434/api/ps

响应：

  {
    "models": [
      {
        "name": "llama3:latest",
        "size": 123456789,
        "digest": "abc123..."
      }
    ]
  }

3. 编程语言集成示例

以下是用 Python 调用 Ollama API 的示例，展示不同场景。

单次生成（Python）

import requests

url = "http://localhost:11434/api/generate"
data = {
    "model": "llama3",
    "prompt": "Explain quantum physics briefly.",
    "stream": True
}

response = requests.post(url, json=data, stream=True)
for line in response.iter_lines():
    if line:
        print(line.decode('utf-8'))

多轮对话（Python）

import requests

url = "http://localhost:11434/api/chat"
data = {
    "model": "mistral",
    "messages": [
        {"role": "user", "content": "What's the weather like today?"},
        {"role": "assistant", "content": "I don't have real-time data, but describe your location!"},
        {"role": "user", "content": "I'm in New York."}
    ],
    "stream": False
}

response = requests.post(url, json=data)
print(response.json()['message']['content'])

嵌入生成（Python）

import requests

url = "http://localhost:11434/api/embeddings"
data = {
    "model": "llama3",
    "prompt": "Machine learning"
}

response = requests.post(url, json=data)
embedding = response.json()['embedding']
print(embedding[:5])  # 打印前5个值

4. 多模态交互（图像支持）

支持模型：如 llava 或 bakllava。
请求示例（图像描述）：

  curl http://localhost:11434/api/generate -d '{
    "model": "llava",
    "prompt": "Describe this image",
    "images": ["<base64-encoded-image>"]
  }'

Python 示例：

  import base64
  import requests

  with open("image.jpg", "rb") as f:
      img_data = base64.b64encode(f.read()).decode('utf-8')

  response = requests.post('http://localhost:11434/api/generate', json={
      "model": "llava",
      "prompt": "What's in this image?",
      "images": [img_data]
  })
  print(response.json()['response'])

5. 关键参数

temperature（0-1）：控制输出随机性，0.2 更确定，0.8 更创意。
top_p（0-1）：核采样，限制输出概率分布。
num_predict：最大生成 token 数（默认 128，-1 为无限制）。
stop：停止词，如 ["\n", "STOP"]。
seed：设置随机种子，确保输出可复现。
num_ctx：上下文窗口大小（默认 2048，最大依模型而定）。

6. 注意事项

服务器运行：API 调用前确保 ollama serve 运行。
端口冲突：默认端口 11434，若被占用，设置环境变量：

  export OLLAMA_HOST=0.0.0.0:11435

性能：
大模型（如 llama3:70b）需 GPU 和充足内存（16GB+）。
小模型（如 phi3）适合低配设备。
错误排查：
连接失败：检查服务器是否运行或端口是否正确。
输出截断：增大 num_predict 或检查模型上下文限制。
安全性：默认仅本地访问，若公开，需配置防火墙。
流式 vs 非流式：流式适合实时应用，非流式适合简单任务。

7. 进阶用法

OpenAI 兼容：Ollama API 与 OpenAI 格式兼容，可直接用 OpenAI 客户端库：

  from openai import OpenAI
  client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
  response = client.chat.completions.create(
      model="llama3",
      messages=[{"role": "user", "content": "Hello!"}]
  )
  print(response.choices[0].message.content)

批量处理：循环调用 API 处理大量数据。
RAG 集成：结合 /api/embeddings 和向量数据库（如 Chroma）实现检索增强生成。
Web 集成：通过 WebSocket 或流式 API 构建实时聊天界面。

如需特定 API 端点的详细示例、编程语言集成（如 Node.js、Go）、多模态处理或框架整合（如 LangChain），请告诉我！

1. API 概述

2. 主要 API 端点详解

2.1. `/api/generate` – 单次文本生成

2.2. `/api/chat` – 多轮对话

2.3. `/api/embeddings` – 生成文本嵌入

2.4. `/api/ps` – 查看运行中的模型

3. 编程语言集成示例

单次生成（Python）

多轮对话（Python）

嵌入生成（Python）

4. 多模态交互（图像支持）

5. 关键参数

6. 注意事项

7. 进阶用法

likuolei

发表回复取消回复

归档

分类

2025 年 12 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

1. API 概述

2. 主要 API 端点详解

2.1. /api/generate – 单次文本生成

2.2. /api/chat – 多轮对话

2.3. /api/embeddings – 生成文本嵌入

2.4. /api/ps – 查看运行中的模型

3. 编程语言集成示例

单次生成（Python）

多轮对话（Python）

嵌入生成（Python）

4. 多模态交互（图像支持）

5. 关键参数

6. 注意事项

7. 进阶用法

likuolei

发表回复 取消回复

相关文章

2.1. `/api/generate` – 单次文本生成

2.2. `/api/chat` – 多轮对话

2.3. `/api/embeddings` – 生成文本嵌入

2.4. `/api/ps` – 查看运行中的模型

发表回复取消回复