惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Recent Commits to openclaw:main
Recent Commits to openclaw:main
博客园 - 叶小钗
Stack Overflow Blog
Stack Overflow Blog
S
SegmentFault 最新的问题
D
DataBreaches.Net
S
Securelist
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
T
Threatpost
C
Cyber Attacks, Cyber Crime and Cyber Security
The Hacker News
The Hacker News
Jina AI
Jina AI
T
Threat Research - Cisco Blogs
GbyAI
GbyAI
Microsoft Azure Blog
Microsoft Azure Blog
WordPress大学
WordPress大学
Engineering at Meta
Engineering at Meta
T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
C
Cisco Blogs
PCI Perspectives
PCI Perspectives
Project Zero
Project Zero
G
Google Developers Blog
宝玉的分享
宝玉的分享
H
Heimdal Security Blog
美团技术团队
Schneier on Security
Schneier on Security
C
CERT Recently Published Vulnerability Notes
Martin Fowler
Martin Fowler
博客园 - 司徒正美
博客园 - 三生石上(FineUI控件)
Help Net Security
Help Net Security
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
Google DeepMind News
Google DeepMind News
C
Check Point Blog
Hacker News: Ask HN
Hacker News: Ask HN
L
LINUX DO - 最新话题
O
OpenAI News
Hacker News - Newest:
Hacker News - Newest: "LLM"
N
Netflix TechBlog - Medium
S
Security Affairs
小众软件
小众软件
MongoDB | Blog
MongoDB | Blog
Blog — PlanetScale
Blog — PlanetScale
V
V2EX - 技术
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
F
Fortinet All Blogs
G
GRAHAM CLULEY
云风的 BLOG
云风的 BLOG
S
Secure Thoughts

Java Code Geeks

PHP in 2026: The Language That Refuses to Die HTTP/3 Comes to the Java HTTP Client Fixing Java ClassCastException for Comparable Objects Elasticsearch keyword vs text Project Panama’s FFM API in Production: Replacing JNI Without Writing C Wrappers NATS vs. Kafka vs. Redis Streams for Java Microservices: When “Simpler” Actually Wins [DEALS] The Premium Learn to Code Certification Bundle (97% off) & Other Deals Up To 98% Off – Offers End Soon! JSpecify vs. Kotlin’s Built-in Null Safety: Can Annotations Ever Match a Type System? Shifting Left on Security: How to Harden CI/CD Pipelines for Payment APIs
Spring AI 1.1 and theModel Context Protocol:Building Production AI AgentsWithout the Python Tax
Eleftheria Drosopoulou · 2026-03-21 · via Java Code Geeks

Spring AI 1.0 shipped in May 2025. The 1.1 follow-up added full MCP integration, 20+ AI model backends, and a structured Advisors API for RAG and conversation memory. For the first time, Java developers can build AI agents that feel genuinely native — no Python sidecar, no LangChain wrapper, no second deployment.

Java developers have been building enterprise backends for decades. Now that AI agents are becoming first-class production components — not experiments, not demos — the question is whether those same developers need to learn Python, adopt LangChain, and maintain a sidecar service just to add intelligence to their applications. Spring AI 1.1 is the most complete answer the JVM has offered: no, you don’t.

Spring AI 1.1 GA shipped on November 12, 2025, after a development cycle that accumulated over 850 improvements across five milestone builds and an RC. The headline features are Model Context Protocol integration, a matured Advisors API, structured output converters, and an expanded set of model backends. Together they add up to something the Java ecosystem hasn’t had before: a full-featured AI agent framework that integrates naturally with Spring Boot, Spring Security, Micrometer, and the rest of the Spring portfolio. Let’s unpack what that means in practice.

1. The Python Tax — and Why It Exists

The phrase “Python tax” captures something real that Java and Kotlin teams deal with daily. Most AI tooling — LangChain, LlamaIndex, AutoGen, CrewAI — was built in Python because the early ML ecosystem lived there. As a result, Java teams adding AI capabilities to their services face a common set of choices: wrap a Python microservice and add an inter-process boundary to every AI call, run a sidecar container that the Java service depends on, or accept a JavaScript SDK that doesn’t fit enterprise architecture patterns.

Each option adds operational complexity, introduces latency at the inter-service boundary, fragments observability across two runtimes, and creates deployment coupling that engineering teams actively try to avoid. Additionally, Spring Security, Spring Data, and the broader Spring ecosystem simply don’t apply to a Python sidecar — so enterprise cross-cutting concerns like authentication, authorisation, audit logging, and distributed tracing have to be re-implemented or bridged.

Python AI stack (LangChain)

  • Separate runtime, deployment, and Docker image
  • HTTP/gRPC boundary between Java service and AI logic
  • Spring Security context does not cross the boundary
  • Two observability stacks (Micrometer + Python agent)
  • Two sets of model configuration and credentials
  • Team must maintain Python dependency tree
  • Cold-start latency on first AI call per container

Spring AI 1.1 (Java/Kotlin)

  • Single JVM process, single Spring Boot app
  • Direct in-process method call to ChatClient
  • @PreAuthorize on MCP tool methods — full Spring Security
  • Native Micrometer integration — one dashboard
  • Single application.yml for all model config
  • Standard Maven/Gradle dependency management
  • Spring Boot startup, no separate warm-up needed

Consequently, the operational argument for Spring AI is at least as strong as the developer experience argument. An AI agent that runs inside a Spring Boot process inherits everything that process already has — including configuration management, health checks, graceful shutdown, and native AOT compilation for GraalVM. That’s not something any Python sidecar can match without substantial extra work.

2. What Spring AI 1.1 Actually Ships

Before diving into code, it helps to understand the architecture. Spring AI is structured around a set of abstraction layers — a portable API that sits above individual model providers, with higher-level features like the Advisors API and ChatMemory built on top of it. The diagram below illustrates how the layers connect.

The most significant new layer in 1.1 is the MCP integration. Spring AI’s MCP Boot Starters, introduced during the 1.1 development cycle, let you expose any Spring bean method as an MCP tool with a single annotation — and consume MCP tools from external servers with a single YAML configuration entry. We’ll look at that in detail shortly. First, it helps to understand what model backends are available.

2.1 Model Provider Coverage

One of Spring AI’s foundational properties is a portable, provider-agnostic API. Switching from OpenAI to Claude to a locally-running Ollama model requires changing only a single application.yml property and swapping a Boot Starter dependency — the ChatClient call in your service class stays completely unchanged. As of 1.1, the following providers are fully supported:

Cross-Cutting ConcernPython SidecarSpring AI (In-Process)
AuthenticationMust re-implement or bridge separatelyNative Spring Security — @PreAuthorize works directly on @Tool methods
AuthorisationNo shared security context across the HTTP boundaryFull RBAC via Spring Security roles and method-level security
Audit LoggingSeparate logging pipeline, no shared MDC or trace contextSpring AOP + Advisors API — audit as a custom CallAdvisor
Distributed TracingSecond trace agent required (e.g. Python OpenTelemetry SDK)Native Micrometer Tracing — one trace spans the full request
Configuration ManagementSeparate config files, env vars, secrets rotationSingle application.yml / Spring Cloud Config for all model + app config
Health ChecksSeparate /health endpoint, separate probe configurationSpring Boot Actuator — AI backends included in health indicators automatically
Data Access (Spring Data)No shared repository, connection pool, or transaction contextDirect @Repository injection into @Tool methods — same transaction scope
Metrics & ObservabilitySeparate Prometheus scrape target, separate dashboardSingle Micrometer registry — token usage, latency, and app metrics unified
Graceful ShutdownSeparate shutdown lifecycle, risk of in-flight AI call lossSpring Boot shutdown hooks cover ChatClient and MCP connections
Dependency ManagementSeparate requirements.txt, Python version, virtual envStandard Maven/Gradle BOM — one dependency graph for the whole application

3. The Model Context Protocol: What It Is and Why Spring Joined Early

The Model Context Protocol (MCP), introduced by Anthropic, standardises how AI applications interact with external tools and resources. Before MCP, every AI framework had its own custom tool-calling format — meaning a tool you wrote for LangChain couldn’t be used by a Claude application without rewriting it. MCP provides a shared client-server protocol so that any MCP client (like a Spring AI application) can use any MCP server (a weather API, a filesystem tool, a database query engine), regardless of which language or framework built it.

“Spring joined the MCP ecosystem early as a key contributor, helping to develop and maintain the official MCP Java SDK that serves as the foundation for Java-based MCP implementations.”— Christian Tzolov, Spring AI Lead, spring.io (September 2025)

Spring AI’s MCP Boot Starters, available since 1.1.0-M1, let you build both sides of this protocol in pure Java. An MCP Server exposes your business logic as tools, resources, and prompt templates. An MCP Client connects to any MCP server — yours or a third party’s — and makes its tools available to the AI model. The model then decides autonomously when and how to call those tools based on the user’s request.

As of 1.1 GA, Spring AI supports MCP Java SDK v0.14, protocol version 2025-06-18, with three transport types: Streamable HTTP (the modern standard), Server-Sent Events (SSE, for backward compatibility), and stdio (for local process tools). The Streamable HTTP transport, in particular, is designed to scale — its stateless mode is explicitly suited for microservice deployments where session state is not maintained between requests.

4. Building an MCP Tool Server and a RAG Agent in Spring AI 1.1

The best way to see how these pieces fit together is to work through a real example. Below, we’ll build a simple but complete agent: an MCP tool server that exposes an inventory lookup tool, and a ChatClient that wires it together with RAG-based product documentation context. All code is runnable with Java 21+ and a valid OpenAI (or Anthropic) API key.

Step 1: Maven Dependencies

pom.xml: Spring AI 1.1 BOM + starters for MCP server, OpenAI, and PGVector

<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>org.springframework.ai</groupId>
      <artifactId>spring-ai-bom</artifactId>
      <version>1.1.0</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

<dependencies>
  <!-- Core Spring Boot -->
  <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
  </dependency>

  <!-- MCP server (exposes tools) -->
  <dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-mcp-server-webmvc</artifactId>
  </dependency>

  <!-- OpenAI chat model -->
  <dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-openai</artifactId>
  </dependency>

  <!-- PGVector for RAG -->
  <dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-vector-store-pgvector</artifactId>
  </dependency>
</dependencies>

Step 2: Exposing a Business Method as an MCP Tool

The @Tool annotation is the centrepiece of Spring AI’s MCP server support. Any Spring bean method annotated with it is automatically discovered, serialised into a JSON Schema tool definition, and made available to connected AI models. Spring Security’s @PreAuthorize applies directly to these methods — making MCP tools first-class citizens of your security model.

MCP tool server — exposing inventory lookup to connected AI models

import org.springframework.ai.tool.annotation.Tool;
import org.springframework.ai.tool.annotation.ToolParam;
import org.springframework.security.access.prepost.PreAuthorize;
import org.springframework.stereotype.Service;

@Service
public class InventoryService {

    @Tool(description = """
        Look up the current stock level and warehouse location
        for a product SKU. Returns null if the SKU is not found.
        """)
    @PreAuthorize("hasRole('AGENT')")  // Spring Security works directly here
    public InventoryRecord getStock(
            @ToolParam(description = "The product SKU code, e.g. PROD-4521")
            String sku) {
        // In production: call your actual inventory repository
        return inventoryRepository.findBySku(sku)
            .map(p -> new InventoryRecord(p.getSku(), p.getQuantity(), p.getWarehouse()))
            .orElse(null);
    }

    // MCP also supports resources (read-only data) and prompt templates
    record InventoryRecord(String sku, int quantity, String warehouseCode) {}
}

Zero registration boilerplate: With Spring AI MCP Boot Starters, you do not need to register your @Tool methods manually. Spring Boot’s auto-configuration discovers all beans with @Tool methods and registers them with the MCP server automatically. The server then exposes them over Streamable HTTP at /mcp by default.

Step 3: The ChatClient — RAG + MCP Tools + Conversation Memory

The real power emerges when you wire together the ChatClient with an Advisors chain. The following example combines conversation memory, RAG (vector-store document retrieval), and MCP tool access in a single, readable builder pattern — the kind of thing that would take hundreds of lines of custom code in a Python framework.

ChatClient with Advisors chain — memory + RAG + MCP tools in one builder

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.MessageChatMemoryAdvisor;
import org.springframework.ai.chat.client.advisor.QuestionAnswerAdvisor;
import org.springframework.ai.chat.memory.InMemoryChatMemory;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.ai.mcp.SyncMcpToolCallbackProvider;
import org.springframework.stereotype.Service;

@Service
public class ProductAgent {

    private final ChatClient chatClient;

    public ProductAgent(
            ChatClient.Builder builder,
            VectorStore productDocs,           // PGVector auto-configured
            SyncMcpToolCallbackProvider tools  // MCP tools auto-discovered
    ) {
        this.chatClient = builder
            .defaultSystem("""
                You are a product support agent. Use the inventory tool
                to check live stock levels. Use your document context to
                answer questions about product specifications.
                Always cite your sources.
                """)
            .defaultAdvisors(
                // 1. Conversation memory — persists across turns
                MessageChatMemoryAdvisor.builder(new InMemoryChatMemory())
                    .build(),
                // 2. RAG — retrieves relevant product docs before each call
                QuestionAnswerAdvisor.builder(productDocs)
                    .searchRequest(r -> r.similarityThreshold(0.65).topK(5))
                    .build()
            )
            .defaultTools(tools)               // 3. MCP tools injected here
            .build();
    }

    public String ask(String conversationId, String userMessage) {
        return chatClient.prompt()
            .advisors(a -> a.param(
                MessageChatMemoryAdvisor.CHAT_MEMORY_CONVERSATION_ID_KEY,
                conversationId))
            .user(userMessage)
            .call()
            .content();
    }
}

What just happened in those 40 lines: The agent above: (1) maintains per-user conversation history across HTTP requests using MessageChatMemoryAdvisor, (2) performs a vector similarity search against a PGVector store on every call and injects the top-5 relevant product docs into the model’s context window, and (3) makes all MCP tool methods available so the model can autonomously call live inventory data when needed. In a Python LangChain equivalent, this would involve separate chain construction, callback handlers, and memory store configuration spread across multiple files.

5. The Advisors API: More Than RAG

The Advisors API is one of Spring AI’s most elegant abstractions. An advisor is conceptually similar to a Spring AOP interceptor — it wraps a ChatClient call and can inspect, modify, or short-circuit the request and response. Advisors chain in order, and they participate fully in Spring’s Observability stack, meaning you get Micrometer metrics and distributed traces for every advisor execution automatically.

Built-in AdvisorWhat it doesWhen to use
MessageChatMemoryAdvisorStores and injects per-conversation message history into every promptAny multi-turn conversational interface
QuestionAnswerAdvisorPerforms vector similarity search and injects retrieved docs as context (Naive RAG)Knowledge base Q&A with a single vector store
RetrievalAugmentationAdvisorModular RAG — configurable query transformers, expanders, re-rankers, and multiple retrieversAdvanced RAG: query rewriting, hybrid search, multi-source retrieval
VectorStoreChatMemoryAdvisorRetrieves semantically relevant conversation history from a vector store rather than all historyLong-running sessions where full history is too large for the context window
SafeGuardAdvisorIntercepts requests and blocks generation of harmful or inappropriate contentPublic-facing assistants with safety requirements
Custom (CallAdvisor)Your own intercept logic — A/B testing, prompt versioning, cost tracking, audit loggingAny cross-cutting concern that shouldn’t live in service code

Building a custom advisor is straightforward. The following example shows a cost-tracking advisor that logs estimated token usage per conversation — the kind of production instrumentation that matters when model API costs are a real budget item.

Custom advisor — token usage tracking per conversation ID

import org.springframework.ai.chat.client.advisor.api.*;
import org.springframework.ai.chat.metadata.Usage;
import org.springframework.core.Ordered;

// Implements CallAdvisor for non-streaming; StreamAdvisor for reactive Flux paths
public class TokenUsageAdvisor implements CallAdvisor {

    private final TokenUsageRepository usageRepo;

    @Override
    public ChatClientResponse aroundCall(
            ChatClientRequest request, CallAdvisorChain chain) {

        // Let the request pass through the rest of the chain
        ChatClientResponse response = chain.nextAroundCall(request);

        // Inspect the usage metadata after the model responds
        Usage usage = response.chatResponse().getMetadata().getUsage();
        String conversationId = (String) request.context()
            .get(MessageChatMemoryAdvisor.CHAT_MEMORY_CONVERSATION_ID_KEY);

        if (usage != null && conversationId != null) {
            usageRepo.record(conversationId,
                usage.getPromptTokens(),
                usage.getGenerationTokens());
        }

        return response;
    }

    @Override
    public int getOrder() {
        return Ordered.LOWEST_PRECEDENCE;  // Run last — after all other advisors
    }
}

6. Structured Output: Mapping Model Responses to Java Types

One of the more quietly powerful features in Spring AI is structured output. Instead of parsing JSON strings from model responses yourself, you call .entity(YourRecord.class) on the ChatClient and Spring AI handles prompt engineering the model to produce valid JSON and deserialising the result into a type-safe Java object. In 1.1, this is reinforced by native structured output support where providers offer it (OpenAI’s JSON mode, Gemini’s response schema validation).

Structured output — model response mapped directly to a Java record

// Define your output schema as a plain Java record or class
record ProductSummary(
    String name,
    String category,
    List<String> keyFeatures,
    double estimatedPrice,
    String availabilityStatus
) {}

// Ask the model — Spring AI handles prompt engineering + deserialisation
ProductSummary summary = chatClient.prompt()
    .user("Summarise the PROD-4521 product based on the provided documentation.")
    .advisors(QuestionAnswerAdvisor.builder(productDocs).build())
    .call()
    .entity(ProductSummary.class);  // Direct Java type — no JSON parsing

System.out.println(summary.name());            // "ProBook X500 Laptop"
System.out.println(summary.availabilityStatus()); // "In Stock (warehouse: EU-3)"

Production note on structured output: Structured output reliability varies by model. GPT-4o and Claude Sonnet 4 produce valid JSON on nearly every call when native JSON mode is enabled. Smaller local models via Ollama may occasionally produce malformed output. For production paths, always wrap .entity() calls in error handling and consider fallback strategies for parsing failures — Spring AI does not currently retry automatically on deserialization errors.

7. Production Readiness: What’s Actually Ready and What to Watch

Spring AI 1.1 is a GA release, which means the Spring team considers its core APIs stable. However, given the rapidly evolving AI landscape, some areas are more battle-tested than others. The table below gives an honest assessment of what’s production-ready today versus what warrants caution.

Feature Area1.1 StatusProduction ReadinessNotes
ChatClient + ChatModel APIGA StableProduction ReadyCore API stable since 1.0. OpenAI, Anthropic, Bedrock well-tested.
MCP Server (WebMVC)GA StableProduction ReadyStateless server issues fixed in 1.1.0-M2. OAuth2 available.
MCP Client (Streamable HTTP)GA StableProduction ReadyProtocol 2025-06-18 compliant. SSE available for legacy servers.
Advisors API (RAG, Memory)GA StableProduction ReadyQuestionAnswerAdvisor stable. RetrievalAugmentationAdvisor mature.
Structured OutputGA StableUse with error handlingReliability varies by model. Native JSON mode preferred.
Prompt Caching (Anthropic/Bedrock)GA StableProduction ReadyUp to 90% cost reduction on long system prompts. TTL: 5min or 1hr.
ReasoningContent APINew in 1.1Evaluate per modelThinking mode support varies: Anthropic & OpenAI most stable.
GraalVM / AOT compilationMCP AOT in M2+Test your specific stackMCP annotation AOT added in 1.1.0-M2. Not all vector stores tested.
Spring AI Agents frameworkCommunity previewNot yet productionAgentic loop / multi-agent patterns in active development as of late 2025.

Spring AI 1.1 Development Cycle: Improvements per Milestone Build

The Honest Verdict

Spring AI 1.1 is the most significant milestone for AI on the JVM since Java itself added modern concurrency primitives. The combination of a portable model API, MCP integration that works with any language’s MCP server, a composable Advisors API for RAG and memory, and native Spring Security support for agent tools is genuinely novel — and genuinely useful for the teams that build enterprise systems in Java and Kotlin.

The “Python tax” is now a choice, not a mandate. Teams that want the operational simplicity of a single JVM process, the security model they already have, and the observability stack they already understand now have a credible, GA-quality path to production AI agents. The areas that still need care — structured output reliability, agentic loop patterns, and GraalVM compatibility at the edges — are real but limited in scope. For the vast majority of production AI agent workloads, Spring AI 1.1 is ready.

8. What We Have Learned

This article has moved from the structural argument — why the Python tax is a real operational problem, not just a language preference — through the architectural choices Spring AI makes, and into the actual production code you would write today. Several important points emerge from that journey.

First, Spring AI 1.1’s most important property is not any individual feature but its cohesion with the Spring ecosystem. MCP tool methods participate in Spring Security. The Advisors API integrates with Micrometer. Auto-configuration handles most wiring. ChatClient follows the same fluent builder pattern as WebClient. This is not accidental — it reflects a deliberate design philosophy that AI capabilities should integrate naturally into existing Java services, not sit alongside them as separate systems.

Second, the MCP integration changes the economics of tool-building. Because any MCP server can talk to any MCP client regardless of language, the investment in building a Spring AI MCP server is not locked to Spring AI clients. A tool server you expose for your Java application can equally be consumed by a Python AI workflow, a TypeScript agent, or any other MCP-compliant host. That cross-language interoperability is the protocol’s most durable value.

Third, the Advisors API deserves more attention than it typically receives. Most tutorials show RAG and memory as the primary use cases, but the pattern is equally suited to cost tracking, safety filtering, prompt versioning, audit logging, and A/B testing model responses — exactly the kind of cross-cutting concerns that enterprise teams manage daily and that have no clean equivalent in LangChain’s callback architecture.

Finally, the honest note on maturity: Spring AI 1.1 is a GA release with real production users, but the AI tooling landscape is still moving fast. The core ChatClient, MCP, and Advisors APIs are stable. The agentic loop and multi-agent orchestration patterns — where most of the AI framework competition is fiercest — are still evolving in Spring’s community preview Spring AI Agents project. That gap will close, but it’s worth knowing where the current boundaries sit before committing production workloads to the more experimental edges of the framework.