SpringAI

轻松入门SpringAI-SpringAI调用Ollama

likuolei2026年1月18日2026年1月18日

轻松入门 Spring AI 调用 Ollama
（2025-2026 最新最实用写法）

目前使用 Spring AI + Ollama 最推荐的几种组合方式（按推荐顺序）：

排名	方式	优点	缺点/限制	适合场景	推荐度
1	Spring AI Ollama ChatClient	配置最少、写法最自然、生态最好	需要 Ollama 已经在运行	绝大多数日常开发	★★★★★
2	使用 OllamaChatModel 手动创建	更灵活，可精细控制参数	代码稍微多一点	需要特殊参数或实验	★★★★
3	AiServices + Ollama	适合做工具调用/结构化输出/Agent	学习曲线稍高	中高级用法	★★★★
4	直接用 Ollama Java 官方客户端	不依赖 Spring AI，完全独立	失去 Spring AI 所有高级抽象	极致轻量/非 Spring 项目	★★

最推荐写法（99% 场景都够用）

1. 依赖（使用最新稳定版）

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
    <version>1.0.0-M6  或  1.0.0.RELEASE（看你用的 spring boot 版本）</version>
</dependency>
<!-- 如果你用的是快照版或 milestone，可能需要添加仓库 -->

2. application.yml 最简配置

spring:
  ai:
    ollama:
      base-url: http://localhost:11434           # 默认就是这个，几乎不用改
      chat:
        options:
          model: qwen2.5:7b-instruct             # ← 改这里就切换模型
          # 常用推荐模型（2026年初）：
          # qwen2.5:7b-instruct
          # deepseek-r1:7b
          # llama3.2:3b
          # phi4:14b
          # gemma2:9b
          temperature: 0.75
          top-p: 0.9
          max-tokens: 4096

3. 最常用代码模板（ChatClient 方式）

@RestController
@RequestMapping("/ollama")
@RequiredArgsConstructor
public class OllamaSimpleController {

    private final ChatClient chatClient;  // Spring AI 自动注入

    // 1. 普通调用
    @GetMapping("/chat")
    public String simpleChat(@RequestParam String msg) {
        return chatClient.prompt()
                .user(msg)
                .call()
                .content();
    }

    // 2. 流式输出（前端打字机效果）
    @GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<String> streamChat(@RequestParam String msg) {
        return chatClient.prompt()
                .user(msg)
                .stream()
                .content();
    }

    // 3. 带系统提示 + 记忆（最实用组合）
    private final ChatMemory chatMemory = new MessageWindowChatMemory(10);

    @GetMapping("/memory")
    public String chatWithMemory(
            @RequestParam String sessionId,
            @RequestParam String message) {

        return chatClient.prompt()
                .system("""
                        你是一位说话很幽默、接地气、喜欢用表情的资深程序员
                        用中文回答，尽量使用 markdown 格式
                        """)
                .user(message)
                .advisors(
                    MessageChatMemoryAdvisor.builder()
                        .chatMemory(chatMemory)
                        .sessionId(sessionId)
                        .build()
                )
                .call()
                .content();
    }
}

快速上手检查清单（按顺序做）

步骤	命令/操作	预期结果

安装并启动 Ollama | ollama serve | 看到 http://localhost:11434
拉取一个常用模型 | ollama pull qwen2.5:7b-instruct | 模型下载完成
命令行简单测试 | ollama run qwen2.5:7b-instruct | 可以正常对话
启动 Spring Boot 项目 | 正常启动 | 无报错
浏览器访问 | /ollama/chat?msg=你好啊 | 得到中文回答
测试流式 | /ollama/stream?msg=讲个程序员笑话 | 看到逐字出现效果

常见问题速查表（2026年初常见坑）

问题	原因	解决方案
连接超时/拒绝连接	Ollama 没启动	先运行 `ollama serve`
模型找不到	模型名写错了	`ollama list` 查看已下载模型名
中文回答很差/乱码	用了不擅长中文的模型	换 qwen2.5 / deepseek-r1 / glm4 等
流式输出不工作	模型本身不支持 streaming	部分小模型不支持，换大一点的模型
显存爆了/启动很慢	模型太大，显卡不够	换 3b/7b 量级模型，或加 `--num-gpu 0` 只用 CPU

一句话总结目前（2026年1月）最舒服的本地组合：

Spring Boot 3.3.x + Spring AI 1.0.x
+ Ollama + qwen2.5:7b-instruct 或 deepseek-r1:7b
+ ChatClient 流式 + 记忆 + 系统提示

祝你本地玩得开心，早日做出自己的小 AI 玩具～ 🚀

需要我帮你把某个特定模型（比如 deepseek-r1 / phi4 / gemma2）的配置和表现做更详细对比吗？

likuolei

网站 http://yingjuxia.com

文章已创建 4893

发表回复取消回复

返回顶部