Advertisement

Spring AI Masterclass

Unlock the power of Generative AI in Java. A comprehensive guide with 5 production-grade examples ranging from Chatbots to RAG and Function Calling.

Spring Boot 3.2+Java 17+OpenAI / OllamaPGVectorDocker

The Rise of Java in the AI Era

For over a decade, the field of Artificial Intelligence and Machine Learning has been dominated by Python. Its rich ecosystem of libraries like NumPy, Pandas, PyTorch, and TensorFlow made it the default choice for data scientists and researchers. However, we are currently witnessing a seismic shift in the industry: the move from Model Training to Generative AI Application Engineering.

In this new era, the focus is less on designing neural network architectures and more on integrating pre-trained Large Language Models (LLMs) into existing enterprise systems. This is where Java, and specifically the Spring ecosystem, shines. Enterprise applications require stability, scalability, security, and type safety—attributes that Java has delivered for 25 years.

Spring AI is an application framework for AI engineering. Its primary goal is to apply Spring ecosystem design principles—such as portability and modular design—to the AI domain. It offers a unified interface for interacting with various AI providers, including OpenAI, Azure OpenAI, Amazon Bedrock, Ollama, Hugging Face, and Mistral AI.

Why Spring AI Matters

The core philosophy of Spring has always been "Write Once, Run Anywhere" (conceptually). Spring AI brings this to GenAI. Instead of coupling your application tightly to the OpenAI API, you program against the `ChatClient` interface.

  • Portability: Switch between GPT-4, Claude 3, and Llama 3 with a simple configuration change in `application.yml`.
  • POJO Mapping: Automatically map unstructured AI text responses to strongly-typed Java Records and Beans.
  • Vector Store Abstraction: A unified API for vector databases like Redis, Pinecone, PGVector, Neo4j, and Chroma.
  • Function Calling: Seamlessly expose Java methods as "tools" that the LLM can execute autonomously.

Setting Up Your Project

Before diving into the examples, you need to set up a Spring Boot project. We recommend using Spring Boot 3.2.x or later and Java 17+. You should add the Spring AI Bill of Materials (BOM) to your `pom.xml` to manage dependencies effectively.

pom.xml
<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>1.0.0-SNAPSHOT</version> <!-- Check for latest version -->
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <!-- Starter for OpenAI -->
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
    </dependency>
    <!-- For Vector Stores (Example) -->
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId>
    </dependency>
</dependencies>

This setup ensures that all Spring AI modules (Core, OpenAI, PGVector, etc.) are version-compatible.

Example 1: Context-Aware Chatbot

The most fundamental use case for Large Language Models is the chatbot. However, building a production-grade chatbot is significantly more complex than a simple API call. It involves Prompt Engineering, managing Conversation History, and handling Latency.

Prompt Templates and Variables

Hardcoding strings in Java is a bad practice, and the same applies to prompts. Spring AI introduces `PromptTemplate`, which works similarly to `JdbcTemplate` or standard string interpolation, but is designed for prompts. It allows you to separate the structure of your prompt from the data.

Configuration

First, configure your API key in `application.yml`. Never hardcode keys in your source code!

application.yml
spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4o
          temperature: 0.7 # 0.0 for deterministic, 1.0 for creative

The Implementation

We will use the `ChatClient` builder. This fluent API allows you to set default system messages (personas) and default advisors (middleware for RAG or history).

AIService.java
@Service
public class AIService {

    private final ChatClient chatClient;

    public AIService(ChatClient.Builder builder) {
        // Initialize with a default system persona
        // This sets the "System Message" which guides the AI's behavior
        this.chatClient = builder
                .defaultSystem("You are a helpful and witty coding assistant named 'Jules'. You prefer clean code.")
                .build();
    }

    public String generateResponse(String concept, String skillLevel) {
        // Using PromptTemplate for dynamic variable substitution
        String templateString = """
                Explain the concept of {concept} to a developer with {skillLevel} experience.
                Include a code example in Java.
                """;

        PromptTemplate promptTemplate = new PromptTemplate(templateString);
        Prompt prompt = promptTemplate.create(Map.of(
                "concept", concept,
                "skillLevel", skillLevel
        ));

        // The call() method blocks until the full response is received.
        return chatClient.prompt(prompt)
                .call()
                .content();
    }
}

Pro Tip: Streaming Responses

LLMs generate text token by token. Waiting for a full 500-word essay to generate might take 10 seconds, leading to a poor user experience.

Use `.stream()` instead of `.call()`. This returns a `Flux<ChatResponse>` (Reactive Streams). You can then pipe this directly to a Server-Sent Events (SSE) endpoint in your controller, allowing the user to see the text typing out in real-time.

Example 2: RAG with PGVector

One of the biggest limitations of LLMs is that they are trained on public data up to a specific cut-off date. They do not know about your private company data, your user manuals, or your recent database entries. If you ask GPT-4 about your internal HR policy, it will hallucinate.

Retrieval Augmented Generation (RAG) is the standard architectural pattern to solve this. It involves retrieving relevant documents from your own data source and injecting them into the prompt context before sending it to the LLM. To do this efficiently at scale, we use Vector Databases.

The Math of Embeddings

An "Embedding" is a list of floating-point numbers (a vector) that represents the semantic meaning of a piece of text. For example, "King" and "Queen" will have vectors that are numerically close to each other in high-dimensional space. "King" and "Apple" will be far apart. By converting your documents into vectors, you can perform Cosine Similarity searches to find text that is conceptually similar to a user's query, not just keyword matches.

Architecture Flow

  1. Ingestion (ETL): Load documents (PDF, JSON, Text) using Spring AI's `DocumentReader`.
  2. Splitting: LLMs have a context window limit (e.g., 128k tokens). You must break large PDFs into smaller chunks using `TokenTextSplitter`.
  3. Embedding: Convert chunks to vectors using an Embedding Model (e.g., `text-embedding-3-small` or `mxbai-embed-large`).
  4. Storage: Save the vector + text content in a Vector Store like PGVector.
  5. Retrieval: On query, finding the top K most similar vectors.
  6. Generation: Pass the retrieved text + user query to the LLM.
RAGService.java
@Service
public class RAGService {

    private final VectorStore vectorStore;
    private final ChatClient chatClient;

    public RAGService(VectorStore vectorStore, ChatClient.Builder builder) {
        this.vectorStore = vectorStore;
        this.chatClient = builder.build();
    }

    // 1. Ingestion Process (Run via a job or admin API)
    public void ingestDocuments(Resource pdfResource) {
        // Tika is a library that can read PDF, Word, HTML, etc.
        TikaDocumentReader reader = new TikaDocumentReader(pdfResource);
        List<Document> documents = reader.get();

        // Split into chunks. Overlap ensures context isn't lost at boundaries.
        TokenTextSplitter splitter = new TokenTextSplitter(1000, 400, 10, 5000, true);
        List<Document> chunks = splitter.apply(documents);

        // This call automatically computes embeddings and saves to DB
        vectorStore.add(chunks);
    }

    // 2. Retrieval & Generation
    public String askWithContext(String query) {
        // Retrieve top 3 most similar documents
        // 'SearchRequest' allows filtering by metadata (e.g., only docs from 2024)
        List<Document> similarDocs = vectorStore.similaritySearch(
                SearchRequest.query(query).withTopK(3)
        );

        // Combine the content of the found documents
        String context = similarDocs.stream()
                .map(Document::getContent)
                .collect(Collectors.joining("\n"));

        // Prompt Engineering: Instruct the AI to ONLY use the provided context
        String systemPrompt = """
                You are a helpful documentation assistant.
                You must answer the user's question strictly using the provided context.
                If the answer is not in the context, simply say 'I don't know'.
                Do not make up facts.

                CONTEXT:
                {context}
                """;

        return chatClient.prompt()
                .system(s -> s.text(systemPrompt).param("context", context))
                .user(query)
                .call()
                .content();
    }
}

This example uses `spring-ai-pgvector-store-spring-boot-starter`. You need a PostgreSQL instance with the `vector` extension installed. Spring Boot will automatically configure the `VectorStore` bean if the properties are present. This abstraction is powerful: if you decide to switch to Redis or Neo4j later, you only change your dependencies and config, not your Java code.

Example 3: Structured Output Mapping

One of the most frustrating aspects of working with LLMs is that they return unstructured text. If you want to use the output in your code (e.g., to save to a database or display in a UI), you need structured data like JSON.

Historically, developers would prompt the LLM: "Please return valid JSON". But LLMs often fail—they might add markdown code blocks (```json ... ```), trailing commas, or conversational text ("Here is your JSON:"). Spring AI solves this with the `BeanOutputParser`.

Type-Safe AI with Java Records

The `BeanOutputParser` does two things: 1. It generates a schema based on your Java Record and injects it into the prompt. 2. It validates the output and deserializes it into your Java object.

MovieRecommender.java
// Define your type-safe data structure
public record MovieRecommendation(
    String title,
    int year,
    String reason,
    List<String> cast
) {}

@RestController
@RequestMapping("/movies")
public class MovieController {

    private final ChatClient chatClient;

    public MovieController(ChatClient.Builder builder) {
        this.chatClient = builder.build();
    }

    @GetMapping("/recommend")
    public MovieRecommendation getRecommendation(@RequestParam String genre) {
        // Create the parser for our specific record class
        var outputParser = new BeanOutputParser<>(MovieRecommendation.class);

        String userPrompt = """
                Recommend a classic {genre} movie from the 90s.
                {format_instructions}
                """;

        PromptTemplate template = new PromptTemplate(userPrompt);
        Prompt prompt = template.create(Map.of(
                "genre", genre,
                // Automatically injects the JSON schema instructions
                "format_instructions", outputParser.getFormat()
        ));

        ChatResponse response = chatClient.prompt(prompt).call();

        // Parse result directly to Java object
        return outputParser.parse(response.getResult().getOutput().getContent());
    }
}

Behind the scenes, `outputParser.getFormat()` creates a prompt segment like: "Your response should be in JSON format conforming to the following schema: ...". If the LLM returns broken JSON, the parser can even automatically retry with a correction prompt (if configured with a RetryTemplate), asking the LLM to fix its own syntax error.

Example 4: Multimodal Vision AI

The latest generation of models (GPT-4o, Gemini 1.5 Pro, Claude 3.5 Sonnet) are Multimodal. They can understand text, images, audio, and video. Spring AI supports sending non-text data to these models.

This capability unlocks powerful use cases:
- Expense Management: Upload a photo of a receipt, and extract the vendor, date, and total amount into a Java Record.
- Accessibility: Generate alt-text descriptions for images uploaded by users.
- Medical Analysis: Analyze X-rays or skin conditions (with appropriate disclaimers).

VisionService.java
@Service
public class VisionService {

    private final ChatClient chatClient;

    public VisionService(ChatClient.Builder builder) {
        this.chatClient = builder.build();
    }

    public String analyzeImage(MultipartFile file) throws IOException {
        // Convert the uploaded file to a Spring Resource
        Resource imageResource = new ByteArrayResource(file.getBytes());

        // Define the user message with both text and media
        UserMessage userMessage = new UserMessage(
                "Analyze this image and list all food ingredients visible along with their estimated calorie count.",
                List.of(new Media(MimeTypeUtils.IMAGE_JPEG, imageResource))
        );

        // Send the request
        ChatResponse response = chatClient.prompt(new Prompt(userMessage)).call();

        return response.getResult().getOutput().getContent();
    }
}

Note on costs: Sending images consumes significantly more tokens than text. Most APIs allow you to specify the "detail" level (Low/High). Spring AI allows passing these options via `OpenAiChatOptions` if needed.

Example 5: Function Calling (Tools)

Perhaps the most powerful feature of modern LLMs is Tool Use (or Function Calling). By default, an LLM is a brain in a jar—it cannot access the internet, check the time, or query a database. Function calling bridges this gap.

You can describe a Java method to the LLM (name, description, parameter schema). When the LLM realizes it needs that function to answer a user's question, it pauses generation and requests that the function be executed. Spring AI handles this loop automatically: it executes your Java method and feeds the result back to the LLM.

Step 1: Define the Function

Functions are defined as standard Java `java.util.function.Function` beans. The `@Description` annotation is crucial—it's what the LLM reads to decide when to use the tool.

ToolsConfig.java
@Configuration
public class ToolsConfig {

    // Define Input and Output records (Schemas)
    public record WeatherRequest(String location, String unit) {}
    public record WeatherResponse(String temp, String condition) {}

    @Bean
    @Description("Get the current weather for a specific location. Unit can be C or F.")
    public Function<WeatherRequest, WeatherResponse> currentWeather() {
        return request -> {
            // Logic to call external Weather API (e.g., OpenWeatherMap)
            System.out.println("Calling weather API for " + request.location());

            // In a real app, use RestClient to call an external API here
            return new WeatherResponse("25", "Sunny");
        };
    }

    @Bean
    @Description("Book a flight for the user.")
    public Function<FlightRequest, FlightResponse> bookFlight() {
        return request -> {
           // Execute transactional logic
           return new FlightResponse("CONFIRMED", "Ticket #123");
        };
    }
}

Step 2: Enable the Function in ChatClient

AssistantService.java
@Service
public class AssistantService {

    private final ChatClient chatClient;

    public AssistantService(ChatClient.Builder builder) {
        this.chatClient = builder
                // Register the tool by its bean name
                // You can register multiple tools here
                .defaultFunctions("currentWeather", "bookFlight")
                .build();
    }

    public String chat(String userMessage) {
        // Scenario: User asks "What's the weather in Tokyo?"
        // 1. LLM analyzes prompt. Sees it matches 'currentWeather' description.
        // 2. LLM returns a structured request to call 'currentWeather' with location="Tokyo".
        // 3. Spring AI intercepts this, calls your Java bean, and gets the result.
        // 4. Spring AI sends the result (25 degrees) back to the LLM.
        // 5. LLM generates final response: "It is currently 25 degrees and Sunny in Tokyo."
        return chatClient.prompt(userMessage).call().content();
    }
}

Security Warning

Be extremely careful when exposing functions that modify state (like `deleteUser` or `transferMoney`). The LLM is probabilistic and might call functions unexpectedly. Always implement permission checks inside your functions and consider a "human-in-the-loop" approval step for sensitive actions.

Production Readiness

Moving from a prototype to production requires more than just working code. You need observability, testing, and performance optimization.

Evaluation and Testing

How do you test a non-deterministic system? Spring AI provides an Evaluation Framework that allows you to run "evals" against your prompts. You can check for relevance, faithfulness (did it hallucinate?), and toxicity.

Observability

Spring AI instruments your application with Micrometer traces and metrics automatically.

  • Tracing: Export traces to Zipkin or Grafana Tempo to visualize the entire chain (Ingress -> Controller -> ChatClient -> OpenAI).
  • Metrics: Track token usage (cost), latency, and error rates using Prometheus and Grafana.

GraalVM Native Images

AI applications often run as serverless functions (Lambda, Knative) to save costs. Java's cold start time can be an issue here. Spring AI fully supports GraalVM Native Images, allowing your application to start in milliseconds and consume significantly less memory.

Ready to Build?

Spring AI transforms Java from a backend workhorse into a powerhouse for Generative AI engineering. Start small with the Chat Client, then scale up to RAG and Autonomous Agents using Function Calling.

Advertisement