Spring AI 1.1 and theModel Context Protocol:Building Production AI AgentsWithout the Python Tax

Spring AI 1.0 shipped in May 2025. The 1.1 follow-up added full MCP integration, 20+ AI model backends, and a structured Advisors API for RAG and conversation memory. For the first time, Java developers can build AI agents that feel genuinely native — no Python sidecar, no LangChain wrapper, no second deployment.

Java developers have been building enterprise backends for decades. Now that AI agents are becoming first-class production components — not experiments, not demos — the question is whether those same developers need to learn Python, adopt LangChain, and maintain a sidecar service just to add intelligence to their applications. Spring AI 1.1 is the most complete answer the JVM has offered: no, you don’t.

Spring AI 1.1 GA shipped on November 12, 2025, after a development cycle that accumulated over 850 improvements across five milestone builds and an RC. The headline features are Model Context Protocol integration, a matured Advisors API, structured output converters, and an expanded set of model backends. Together they add up to something the Java ecosystem hasn’t had before: a full-featured AI agent framework that integrates naturally with Spring Boot, Spring Security, Micrometer, and the rest of the Spring portfolio. Let’s unpack what that means in practice.

1. The Python Tax — and Why It Exists

The phrase “Python tax” captures something real that Java and Kotlin teams deal with daily. Most AI tooling — LangChain, LlamaIndex, AutoGen, CrewAI — was built in Python because the early ML ecosystem lived there. As a result, Java teams adding AI capabilities to their services face a common set of choices: wrap a Python microservice and add an inter-process boundary to every AI call, run a sidecar container that the Java service depends on, or accept a JavaScript SDK that doesn’t fit enterprise architecture patterns.

Each option adds operational complexity, introduces latency at the inter-service boundary, fragments observability across two runtimes, and creates deployment coupling that engineering teams actively try to avoid. Additionally, Spring Security, Spring Data, and the broader Spring ecosystem simply don’t apply to a Python sidecar — so enterprise cross-cutting concerns like authentication, authorisation, audit logging, and distributed tracing have to be re-implemented or bridged.

Python AI stack (LangChain)

Separate runtime, deployment, and Docker image
HTTP/gRPC boundary between Java service and AI logic
Spring Security context does not cross the boundary
Two observability stacks (Micrometer + Python agent)
Two sets of model configuration and credentials
Team must maintain Python dependency tree
Cold-start latency on first AI call per container

Spring AI 1.1 (Java/Kotlin)

Single JVM process, single Spring Boot app
Direct in-process method call to ChatClient
@PreAuthorize on MCP tool methods — full Spring Security
Native Micrometer integration — one dashboard
Single application.yml for all model config
Standard Maven/Gradle dependency management
Spring Boot startup, no separate warm-up needed

Consequently, the operational argument for Spring AI is at least as strong as the developer experience argument. An AI agent that runs inside a Spring Boot process inherits everything that process already has — including configuration management, health checks, graceful shutdown, and native AOT compilation for GraalVM. That’s not something any Python sidecar can match without substantial extra work.

2. What Spring AI 1.1 Actually Ships

Before diving into code, it helps to understand the architecture. Spring AI is structured around a set of abstraction layers — a portable API that sits above individual model providers, with higher-level features like the Advisors API and ChatMemory built on top of it. The diagram below illustrates how the layers connect.

The most significant new layer in 1.1 is the MCP integration. Spring AI’s MCP Boot Starters, introduced during the 1.1 development cycle, let you expose any Spring bean method as an MCP tool with a single annotation — and consume MCP tools from external servers with a single YAML configuration entry. We’ll look at that in detail shortly. First, it helps to understand what model backends are available.

2.1 Model Provider Coverage

One of Spring AI’s foundational properties is a portable, provider-agnostic API. Switching from OpenAI to Claude to a locally-running Ollama model requires changing only a single application.yml property and swapping a Boot Starter dependency — the ChatClient call in your service class stays completely unchanged. As of 1.1, the following providers are fully supported:

Cross-Cutting Concern	Python Sidecar	Spring AI (In-Process)
Authentication	Must re-implement or bridge separately	Native Spring Security — `@PreAuthorize` works directly on `@Tool` methods
Authorisation	No shared security context across the HTTP boundary	Full RBAC via Spring Security roles and method-level security
Audit Logging	Separate logging pipeline, no shared MDC or trace context	Spring AOP + Advisors API — audit as a custom `CallAdvisor`
Distributed Tracing	Second trace agent required (e.g. Python OpenTelemetry SDK)	Native Micrometer Tracing — one trace spans the full request
Configuration Management	Separate config files, env vars, secrets rotation	Single `application.yml` / Spring Cloud Config for all model + app config
Health Checks	Separate `/health` endpoint, separate probe configuration	Spring Boot Actuator — AI backends included in health indicators automatically
Data Access (Spring Data)	No shared repository, connection pool, or transaction context	Direct `@Repository` injection into `@Tool` methods — same transaction scope
Metrics & Observability	Separate Prometheus scrape target, separate dashboard	Single Micrometer registry — token usage, latency, and app metrics unified
Graceful Shutdown	Separate shutdown lifecycle, risk of in-flight AI call loss	Spring Boot shutdown hooks cover ChatClient and MCP connections
Dependency Management	Separate `requirements.txt`, Python version, virtual env	Standard Maven/Gradle BOM — one dependency graph for the whole application

3. The Model Context Protocol: What It Is and Why Spring Joined Early

The Model Context Protocol (MCP), introduced by Anthropic, standardises how AI applications interact with external tools and resources. Before MCP, every AI framework had its own custom tool-calling format — meaning a tool you wrote for LangChain couldn’t be used by a Claude application without rewriting it. MCP provides a shared client-server protocol so that any MCP client (like a Spring AI application) can use any MCP server (a weather API, a filesystem tool, a database query engine), regardless of which language or framework built it.

“Spring joined the MCP ecosystem early as a key contributor, helping to develop and maintain the official MCP Java SDK that serves as the foundation for Java-based MCP implementations.”— Christian Tzolov, Spring AI Lead, spring.io (September 2025)

Spring AI’s MCP Boot Starters, available since 1.1.0-M1, let you build both sides of this protocol in pure Java. An MCP Server exposes your business logic as tools, resources, and prompt templates. An MCP Client connects to any MCP server — yours or a third party’s — and makes its tools available to the AI model. The model then decides autonomously when and how to call those tools based on the user’s request.

As of 1.1 GA, Spring AI supports MCP Java SDK v0.14, protocol version 2025-06-18, with three transport types: Streamable HTTP (the modern standard), Server-Sent Events (SSE, for backward compatibility), and stdio (for local process tools). The Streamable HTTP transport, in particular, is designed to scale — its stateless mode is explicitly suited for microservice deployments where session state is not maintained between requests.

4. Building an MCP Tool Server and a RAG Agent in Spring AI 1.1

The best way to see how these pieces fit together is to work through a real example. Below, we’ll build a simple but complete agent: an MCP tool server that exposes an inventory lookup tool, and a ChatClient that wires it together with RAG-based product documentation context. All code is runnable with Java 21+ and a valid OpenAI (or Anthropic) API key.

Step 1: Maven Dependencies

pom.xml: Spring AI 1.1 BOM + starters for MCP server, OpenAI, and PGVector

<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>org.springframework.ai</groupId>
      <artifactId>spring-ai-bom</artifactId>
      <version>1.1.0</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

<dependencies>
  <!-- Core Spring Boot -->
  <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
  </dependency>

  <!-- MCP server (exposes tools) -->
  <dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-mcp-server-webmvc</artifactId>
  </dependency>

  <!-- OpenAI chat model -->
  <dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-openai</artifactId>
  </dependency>

  <!-- PGVector for RAG -->
  <dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-vector-store-pgvector</artifactId>
  </dependency>
</dependencies>

Step 2: Exposing a Business Method as an MCP Tool

The @Tool annotation is the centrepiece of Spring AI’s MCP server support. Any Spring bean method annotated with it is automatically discovered, serialised into a JSON Schema tool definition, and made available to connected AI models. Spring Security’s @PreAuthorize applies directly to these methods — making MCP tools first-class citizens of your security model.

MCP tool server — exposing inventory lookup to connected AI models

import org.springframework.ai.tool.annotation.Tool;
import org.springframework.ai.tool.annotation.ToolParam;
import org.springframework.security.access.prepost.PreAuthorize;
import org.springframework.stereotype.Service;

@Service
public class InventoryService {

    @Tool(description = """
        Look up the current stock level and warehouse location
        for a product SKU. Returns null if the SKU is not found.
        """)
    @PreAuthorize("hasRole('AGENT')")  // Spring Security works directly here
    public InventoryRecord getStock(
            @ToolParam(description = "The product SKU code, e.g. PROD-4521")
            String sku) {
        // In production: call your actual inventory repository
        return inventoryRepository.findBySku(sku)
            .map(p -> new InventoryRecord(p.getSku(), p.getQuantity(), p.getWarehouse()))
            .orElse(null);
    }

    // MCP also supports resources (read-only data) and prompt templates
    record InventoryRecord(String sku, int quantity, String warehouseCode) {}
}

Zero registration boilerplate: With Spring AI MCP Boot Starters, you do not need to register your @Tool methods manually. Spring Boot’s auto-configuration discovers all beans with @Tool methods and registers them with the MCP server automatically. The server then exposes them over Streamable HTTP at /mcp by default.

Step 3: The ChatClient — RAG + MCP Tools + Conversation Memory

The real power emerges when you wire together the ChatClient with an Advisors chain. The following example combines conversation memory, RAG (vector-store document retrieval), and MCP tool access in a single, readable builder pattern — the kind of thing that would take hundreds of lines of custom code in a Python framework.

ChatClient with Advisors chain — memory + RAG + MCP tools in one builder

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.MessageChatMemoryAdvisor;
import org.springframework.ai.chat.client.advisor.QuestionAnswerAdvisor;
import org.springframework.ai.chat.memory.InMemoryChatMemory;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.ai.mcp.SyncMcpToolCallbackProvider;
import org.springframework.stereotype.Service;

@Service
public class ProductAgent {

    private final ChatClient chatClient;

    public ProductAgent(
            ChatClient.Builder builder,
            VectorStore productDocs,           // PGVector auto-configured
            SyncMcpToolCallbackProvider tools  // MCP tools auto-discovered
    ) {
        this.chatClient = builder
            .defaultSystem("""
                You are a product support agent. Use the inventory tool
                to check live stock levels. Use your document context to
                answer questions about product specifications.
                Always cite your sources.
                """)
            .defaultAdvisors(
                // 1. Conversation memory — persists across turns
                MessageChatMemoryAdvisor.builder(new InMemoryChatMemory())
                    .build(),
                // 2. RAG — retrieves relevant product docs before each call
                QuestionAnswerAdvisor.builder(productDocs)
                    .searchRequest(r -> r.similarityThreshold(0.65).topK(5))
                    .build()
            )
            .defaultTools(tools)               // 3. MCP tools injected here
            .build();
    }

    public String ask(String conversationId, String userMessage) {
        return chatClient.prompt()
            .advisors(a -> a.param(
                MessageChatMemoryAdvisor.CHAT_MEMORY_CONVERSATION_ID_KEY,
                conversationId))
            .user(userMessage)
            .call()
            .content();
    }
}

What just happened in those 40 lines: The agent above: (1) maintains per-user conversation history across HTTP requests using MessageChatMemoryAdvisor, (2) performs a vector similarity search against a PGVector store on every call and injects the top-5 relevant product docs into the model’s context window, and (3) makes all MCP tool methods available so the model can autonomously call live inventory data when needed. In a Python LangChain equivalent, this would involve separate chain construction, callback handlers, and memory store configuration spread across multiple files.

5. The Advisors API: More Than RAG

The Advisors API is one of Spring AI’s most elegant abstractions. An advisor is conceptually similar to a Spring AOP interceptor — it wraps a ChatClient call and can inspect, modify, or short-circuit the request and response. Advisors chain in order, and they participate fully in Spring’s Observability stack, meaning you get Micrometer metrics and distributed traces for every advisor execution automatically.

Built-in Advisor	What it does	When to use
`MessageChatMemoryAdvisor`	Stores and injects per-conversation message history into every prompt	Any multi-turn conversational interface
`QuestionAnswerAdvisor`	Performs vector similarity search and injects retrieved docs as context (Naive RAG)	Knowledge base Q&A with a single vector store
`RetrievalAugmentationAdvisor`	Modular RAG — configurable query transformers, expanders, re-rankers, and multiple retrievers	Advanced RAG: query rewriting, hybrid search, multi-source retrieval
`VectorStoreChatMemoryAdvisor`	Retrieves semantically relevant conversation history from a vector store rather than all history	Long-running sessions where full history is too large for the context window
`SafeGuardAdvisor`	Intercepts requests and blocks generation of harmful or inappropriate content	Public-facing assistants with safety requirements
Custom (`CallAdvisor`)	Your own intercept logic — A/B testing, prompt versioning, cost tracking, audit logging	Any cross-cutting concern that shouldn’t live in service code

Building a custom advisor is straightforward. The following example shows a cost-tracking advisor that logs estimated token usage per conversation — the kind of production instrumentation that matters when model API costs are a real budget item.

Custom advisor — token usage tracking per conversation ID

import org.springframework.ai.chat.client.advisor.api.*;
import org.springframework.ai.chat.metadata.Usage;
import org.springframework.core.Ordered;

// Implements CallAdvisor for non-streaming; StreamAdvisor for reactive Flux paths
public class TokenUsageAdvisor implements CallAdvisor {

    private final TokenUsageRepository usageRepo;

    @Override
    public ChatClientResponse aroundCall(
            ChatClientRequest request, CallAdvisorChain chain) {

        // Let the request pass through the rest of the chain
        ChatClientResponse response = chain.nextAroundCall(request);

        // Inspect the usage metadata after the model responds
        Usage usage = response.chatResponse().getMetadata().getUsage();
        String conversationId = (String) request.context()
            .get(MessageChatMemoryAdvisor.CHAT_MEMORY_CONVERSATION_ID_KEY);

        if (usage != null && conversationId != null) {
            usageRepo.record(conversationId,
                usage.getPromptTokens(),
                usage.getGenerationTokens());
        }

        return response;
    }

    @Override
    public int getOrder() {
        return Ordered.LOWEST_PRECEDENCE;  // Run last — after all other advisors
    }
}

6. Structured Output: Mapping Model Responses to Java Types

One of the more quietly powerful features in Spring AI is structured output. Instead of parsing JSON strings from model responses yourself, you call .entity(YourRecord.class) on the ChatClient and Spring AI handles prompt engineering the model to produce valid JSON and deserialising the result into a type-safe Java object. In 1.1, this is reinforced by native structured output support where providers offer it (OpenAI’s JSON mode, Gemini’s response schema validation).

Structured output — model response mapped directly to a Java record

// Define your output schema as a plain Java record or class
record ProductSummary(
    String name,
    String category,
    List<String> keyFeatures,
    double estimatedPrice,
    String availabilityStatus
) {}

// Ask the model — Spring AI handles prompt engineering + deserialisation
ProductSummary summary = chatClient.prompt()
    .user("Summarise the PROD-4521 product based on the provided documentation.")
    .advisors(QuestionAnswerAdvisor.builder(productDocs).build())
    .call()
    .entity(ProductSummary.class);  // Direct Java type — no JSON parsing

System.out.println(summary.name());            // "ProBook X500 Laptop"
System.out.println(summary.availabilityStatus()); // "In Stock (warehouse: EU-3)"

Production note on structured output: Structured output reliability varies by model. GPT-4o and Claude Sonnet 4 produce valid JSON on nearly every call when native JSON mode is enabled. Smaller local models via Ollama may occasionally produce malformed output. For production paths, always wrap .entity() calls in error handling and consider fallback strategies for parsing failures — Spring AI does not currently retry automatically on deserialization errors.

7. Production Readiness: What’s Actually Ready and What to Watch

Spring AI 1.1 is a GA release, which means the Spring team considers its core APIs stable. However, given the rapidly evolving AI landscape, some areas are more battle-tested than others. The table below gives an honest assessment of what’s production-ready today versus what warrants caution.

Feature Area	1.1 Status	Production Readiness	Notes
ChatClient + ChatModel API	GA Stable	Production Ready	Core API stable since 1.0. OpenAI, Anthropic, Bedrock well-tested.
MCP Server (WebMVC)	GA Stable	Production Ready	Stateless server issues fixed in 1.1.0-M2. OAuth2 available.
MCP Client (Streamable HTTP)	GA Stable	Production Ready	Protocol 2025-06-18 compliant. SSE available for legacy servers.
Advisors API (RAG, Memory)	GA Stable	Production Ready	QuestionAnswerAdvisor stable. RetrievalAugmentationAdvisor mature.
Structured Output	GA Stable	Use with error handling	Reliability varies by model. Native JSON mode preferred.
Prompt Caching (Anthropic/Bedrock)	GA Stable	Production Ready	Up to 90% cost reduction on long system prompts. TTL: 5min or 1hr.
ReasoningContent API	New in 1.1	Evaluate per model	Thinking mode support varies: Anthropic & OpenAI most stable.
GraalVM / AOT compilation	MCP AOT in M2+	Test your specific stack	MCP annotation AOT added in 1.1.0-M2. Not all vector stores tested.
Spring AI Agents framework	Community preview	Not yet production	Agentic loop / multi-agent patterns in active development as of late 2025.

Spring AI 1.1 Development Cycle: Improvements per Milestone Build

The Honest Verdict

Spring AI 1.1 is the most significant milestone for AI on the JVM since Java itself added modern concurrency primitives. The combination of a portable model API, MCP integration that works with any language’s MCP server, a composable Advisors API for RAG and memory, and native Spring Security support for agent tools is genuinely novel — and genuinely useful for the teams that build enterprise systems in Java and Kotlin.

The “Python tax” is now a choice, not a mandate. Teams that want the operational simplicity of a single JVM process, the security model they already have, and the observability stack they already understand now have a credible, GA-quality path to production AI agents. The areas that still need care — structured output reliability, agentic loop patterns, and GraalVM compatibility at the edges — are real but limited in scope. For the vast majority of production AI agent workloads, Spring AI 1.1 is ready.

8. What We Have Learned

This article has moved from the structural argument — why the Python tax is a real operational problem, not just a language preference — through the architectural choices Spring AI makes, and into the actual production code you would write today. Several important points emerge from that journey.

First, Spring AI 1.1’s most important property is not any individual feature but its cohesion with the Spring ecosystem. MCP tool methods participate in Spring Security. The Advisors API integrates with Micrometer. Auto-configuration handles most wiring. ChatClient follows the same fluent builder pattern as WebClient. This is not accidental — it reflects a deliberate design philosophy that AI capabilities should integrate naturally into existing Java services, not sit alongside them as separate systems.

Second, the MCP integration changes the economics of tool-building. Because any MCP server can talk to any MCP client regardless of language, the investment in building a Spring AI MCP server is not locked to Spring AI clients. A tool server you expose for your Java application can equally be consumed by a Python AI workflow, a TypeScript agent, or any other MCP-compliant host. That cross-language interoperability is the protocol’s most durable value.

Third, the Advisors API deserves more attention than it typically receives. Most tutorials show RAG and memory as the primary use cases, but the pattern is equally suited to cost tracking, safety filtering, prompt versioning, audit logging, and A/B testing model responses — exactly the kind of cross-cutting concerns that enterprise teams manage daily and that have no clean equivalent in LangChain’s callback architecture.

Finally, the honest note on maturity: Spring AI 1.1 is a GA release with real production users, but the AI tooling landscape is still moving fast. The core ChatClient, MCP, and Advisors APIs are stable. The agentic loop and multi-agent orchestration patterns — where most of the AI framework competition is fiercest — are still evolving in Spring’s community preview Spring AI Agents project. That gap will close, but it’s worth knowing where the current boundaries sit before committing production workloads to the more experimental edges of the framework.

推荐订阅源

Java Code Geeks