Advertisement

Building RAG with Spring Boot and Vector Databases

25 min readSpring Boot, RAG, PgVector

Retrieval-Augmented Generation (RAG) is the architecture of choice for bringing custom knowledge to LLMs. Instead of fine-tuning models, we can inject relevant context dynamically. In this guide, we'll build a production-ready RAG system using Spring Boot, PostgreSQL (PgVector), and Spring AI.

Understanding RAG

LLMs like GPT-4 are frozen in time. They don't know about your private company data, your latest emails, or yesterday's news. RAG bridges this gap by following a three-step process:

  1. Retrieve: Search a knowledge base (Vector DB) for information relevant to the user's query.
  2. Augment: Combine the user's query with the retrieved information to create a context-rich prompt.
  3. Generate: Send the augmented prompt to the LLM to generate an answer.

System Architecture

Our stack consists of:

  • Spring Boot 3.2+: The application framework.
  • Spring AI: Abstraction for embeddings and chat models.
  • PostgreSQL + PgVector: The vector database for storing embeddings.
  • OpenAI `text-embedding-ada-002`: For generating vector embeddings.

Setting up PgVector

First, enable the `vector` extension in your PostgreSQL database.

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE vector_store (
	id uuid DEFAULT uuid_generate_v4() PRIMARY KEY,
	content text,
	metadata json,
	embedding vector(1536)
);

In `application.yml`, configure the Spring AI vector store to point to this instance.

Document Ingestion Pipeline

We need a service to read documents (PDFs, text files), split them into chunks, generate embeddings, and store them.

@Service
@RequiredArgsConstructor
public class IngestionService {

    private final VectorStore vectorStore;

    public void ingestFile(Resource file) {
        // 1. Read
        TikaDocumentReader reader = new TikaDocumentReader(file);
        List<Document> documents = reader.get();

        // 2. Split
        // 1536 tokens is a good balance for ADA-002
        TokenTextSplitter splitter = new TokenTextSplitter(1000, 400, 10, 10000, true);
        List<Document> chunks = splitter.apply(documents);

        // 3. Store (Spring AI handles embedding generation automatically here)
        vectorStore.add(chunks);
    }
}

The Retrieval Service

When a user asks a question, we search the vector store and feed the results to the Chat Client.

@RestController
@RequiredArgsConstructor
public class RagController {

    private final VectorStore vectorStore;
    private final ChatClient chatClient;

    @PostMapping("/ask")
    public Map<String, String> ask(@RequestBody String question) {
        // 1. Similarity Search
        List<Document> similarDocs = vectorStore.similaritySearch(
            SearchRequest.query(question).withTopK(4)
        );

        // 2. Build Context
        String context = similarDocs.stream()
            .map(Document::getContent)
            .collect(Collectors.joining("\n\n"));

        // 3. Prompt Template
        SystemPromptTemplate systemPrompt = new SystemPromptTemplate(
            "You are a helpful assistant. Use the following context to answer the user&apos;s question. " +
            "If the answer is not in the context, say you don&apos;t know.\n\n" +
            "Context:\n{context}"
        );

        Message systemMessage = systemPrompt.createMessage(Map.of("context", context));

        // 4. Generate
        String answer = chatClient.prompt()
            .messages(systemMessage, new UserMessage(question))
            .call()
            .content();

        return Map.of("answer", answer, "source_docs", context);
    }
}

Security & Data Privacy in RAG

RAG systems expose your internal knowledge base to the AI. This means access control is critical. A common vulnerability is "Context Leaking," where a user asks a question that retrieves documents they shouldn't have access to (e.g., HR salaries).

RAG Security Checklist

  • Document-Level ACLs: Store permissions in the vector metadata. When querying, filter results based on the current user's roles.
  • Prompt Injection in RAG: Malicious documents ingested into your system can contain "poisoned" instructions that override the system prompt when retrieved. Sanitize input documents.
  • Data Sovereignty: Ensure your vector database and the embedding model comply with data residency laws (GDPR, CCPA).
  • Embedding Inversion: While difficult, it is theoretically possible to reconstruct text from embeddings. Treat vectors as sensitive data.

Implementing Metadata Filters for ACLs:

// Searching with Filters
FilterExpressionBuilder b = new FilterExpressionBuilder();
Expression filter = b.eq("department", currentUser.getDepartment()).build();

List<Document> results = vectorStore.similaritySearch(
    SearchRequest.query(question)
    .withTopK(3)
    .withFilterExpression(filter)
);

By attaching metadata like `department` or `userId` to every document during ingestion, we can enforce strict access controls at the retrieval query level, ensuring users never see context they aren't authorized to view.

Conclusion

RAG is a powerful pattern that turns generic LLMs into domain experts. With Spring AI and PgVector, Java developers can build these systems using familiar tools and patterns. Focus on data quality and security—your RAG system is only as good (and safe) as the data you feed it.

Written by the DevMetrix Team • Published December 9, 2025

Advertisement