Unlock the power of Generative AI in Java. A comprehensive guide with 5 production-grade examples ranging from Chatbots to RAG and Function Calling.
For over a decade, the field of Artificial Intelligence and Machine Learning has been dominated by Python. Its rich ecosystem of libraries like NumPy, Pandas, PyTorch, and TensorFlow made it the default choice for data scientists and researchers. However, we are currently witnessing a seismic shift in the industry: the move from Model Training to Generative AI Application Engineering.
In this new era, the focus is less on designing neural network architectures and more on integrating pre-trained Large Language Models (LLMs) into existing enterprise systems. This is where Java, and specifically the Spring ecosystem, shines. Enterprise applications require stability, scalability, security, and type safety—attributes that Java has delivered for 25 years.
Spring AI is an application framework for AI engineering. Its primary goal is to apply Spring ecosystem design principles—such as portability and modular design—to the AI domain. It offers a unified interface for interacting with various AI providers, including OpenAI, Azure OpenAI, Amazon Bedrock, Ollama, Hugging Face, and Mistral AI.
The core philosophy of Spring has always been "Write Once, Run Anywhere" (conceptually). Spring AI brings this to GenAI. Instead of coupling your application tightly to the OpenAI API, you program against the `ChatClient` interface.
Before diving into the examples, you need to set up a Spring Boot project. We recommend using Spring Boot 3.2.x or later and Java 17+. You should add the Spring AI Bill of Materials (BOM) to your `pom.xml` to manage dependencies effectively.
This setup ensures that all Spring AI modules (Core, OpenAI, PGVector, etc.) are version-compatible.
The most fundamental use case for Large Language Models is the chatbot. However, building a production-grade chatbot is significantly more complex than a simple API call. It involves Prompt Engineering, managing Conversation History, and handling Latency.
Hardcoding strings in Java is a bad practice, and the same applies to prompts. Spring AI introduces `PromptTemplate`, which works similarly to `JdbcTemplate` or standard string interpolation, but is designed for prompts. It allows you to separate the structure of your prompt from the data.
First, configure your API key in `application.yml`. Never hardcode keys in your source code!
We will use the `ChatClient` builder. This fluent API allows you to set default system messages (personas) and default advisors (middleware for RAG or history).
LLMs generate text token by token. Waiting for a full 500-word essay to generate might take 10 seconds, leading to a poor user experience.
Use `.stream()` instead of `.call()`. This returns a `Flux<ChatResponse>` (Reactive Streams). You can then pipe this directly to a Server-Sent Events (SSE) endpoint in your controller, allowing the user to see the text typing out in real-time.
One of the biggest limitations of LLMs is that they are trained on public data up to a specific cut-off date. They do not know about your private company data, your user manuals, or your recent database entries. If you ask GPT-4 about your internal HR policy, it will hallucinate.
Retrieval Augmented Generation (RAG) is the standard architectural pattern to solve this. It involves retrieving relevant documents from your own data source and injecting them into the prompt context before sending it to the LLM. To do this efficiently at scale, we use Vector Databases.
An "Embedding" is a list of floating-point numbers (a vector) that represents the semantic meaning of a piece of text. For example, "King" and "Queen" will have vectors that are numerically close to each other in high-dimensional space. "King" and "Apple" will be far apart. By converting your documents into vectors, you can perform Cosine Similarity searches to find text that is conceptually similar to a user's query, not just keyword matches.
This example uses `spring-ai-pgvector-store-spring-boot-starter`. You need a PostgreSQL instance with the `vector` extension installed. Spring Boot will automatically configure the `VectorStore` bean if the properties are present. This abstraction is powerful: if you decide to switch to Redis or Neo4j later, you only change your dependencies and config, not your Java code.
One of the most frustrating aspects of working with LLMs is that they return unstructured text. If you want to use the output in your code (e.g., to save to a database or display in a UI), you need structured data like JSON.
Historically, developers would prompt the LLM: "Please return valid JSON". But LLMs often fail—they might add markdown code blocks (```json ... ```), trailing commas, or conversational text ("Here is your JSON:"). Spring AI solves this with the `BeanOutputParser`.
The `BeanOutputParser` does two things: 1. It generates a schema based on your Java Record and injects it into the prompt. 2. It validates the output and deserializes it into your Java object.
Behind the scenes, `outputParser.getFormat()` creates a prompt segment like: "Your response should be in JSON format conforming to the following schema: ...". If the LLM returns broken JSON, the parser can even automatically retry with a correction prompt (if configured with a RetryTemplate), asking the LLM to fix its own syntax error.
The latest generation of models (GPT-4o, Gemini 1.5 Pro, Claude 3.5 Sonnet) are Multimodal. They can understand text, images, audio, and video. Spring AI supports sending non-text data to these models.
This capability unlocks powerful use cases:
- Expense Management: Upload a photo of a receipt, and extract the vendor, date, and total amount into a Java Record.
- Accessibility: Generate alt-text descriptions for images uploaded by users.
- Medical Analysis: Analyze X-rays or skin conditions (with appropriate disclaimers).
Note on costs: Sending images consumes significantly more tokens than text. Most APIs allow you to specify the "detail" level (Low/High). Spring AI allows passing these options via `OpenAiChatOptions` if needed.
Perhaps the most powerful feature of modern LLMs is Tool Use (or Function Calling). By default, an LLM is a brain in a jar—it cannot access the internet, check the time, or query a database. Function calling bridges this gap.
You can describe a Java method to the LLM (name, description, parameter schema). When the LLM realizes it needs that function to answer a user's question, it pauses generation and requests that the function be executed. Spring AI handles this loop automatically: it executes your Java method and feeds the result back to the LLM.
Functions are defined as standard Java `java.util.function.Function` beans. The `@Description` annotation is crucial—it's what the LLM reads to decide when to use the tool.
Be extremely careful when exposing functions that modify state (like `deleteUser` or `transferMoney`). The LLM is probabilistic and might call functions unexpectedly. Always implement permission checks inside your functions and consider a "human-in-the-loop" approval step for sensitive actions.
Moving from a prototype to production requires more than just working code. You need observability, testing, and performance optimization.
How do you test a non-deterministic system? Spring AI provides an Evaluation Framework that allows you to run "evals" against your prompts. You can check for relevance, faithfulness (did it hallucinate?), and toxicity.
Spring AI instruments your application with Micrometer traces and metrics automatically.
AI applications often run as serverless functions (Lambda, Knative) to save costs. Java's cold start time can be an issue here. Spring AI fully supports GraalVM Native Images, allowing your application to start in milliseconds and consume significantly less memory.
Spring AI transforms Java from a backend workhorse into a powerhouse for Generative AI engineering. Start small with the Chat Client, then scale up to RAG and Autonomous Agents using Function Calling.