DevMetrix.cloud - Free Developer Tools & Utilities

Why Event-Driven for AI?

Blocking a thread for 30 seconds is a recipe for disaster in a high-concurrency Java application. If you have 200 Tomcat threads and 200 concurrent users waiting for GPT-4, your server is effectively dead.

By offloading the work to a Kafka topic, you acknowledge the request instantly ("HTTP 202 Accepted") and process it in the background at your own pace.

System Architecture

Frontend (Next.js): Sends request, subscribes to WebSocket/SSE for updates.
API Service (Spring Boot): Validates request, publishes `JobSubmittedEvent` to Kafka.
Kafka: Buffers requests.
AI Worker (Spring Boot + Spring AI): Consumes event, calls LLM, publishes `JobCompletedEvent`.

The Producer (API Gateway)

@RestController
@RequiredArgsConstructor
public class JobController {

    private final StreamBridge streamBridge;

    @PostMapping("/generate")
    public ResponseEntity<Map<String, String>> submitJob(@RequestBody PromptRequest request) {
        String jobId = UUID.randomUUID().toString();

        JobEvent event = new JobEvent(jobId, request.getPrompt(), getCurrentUserId());

        streamBridge.send("ai-jobs-out-0", event);

        return ResponseEntity.accepted().body(Map.of("jobId", jobId));
    }
}

The Consumer (AI Worker)

@Component
@RequiredArgsConstructor
public class AIWorker {

    private final ChatClient chatClient;
    private final StreamBridge streamBridge;

    @Bean
    public Consumer<JobEvent> processJob() {
        return event -> {
            try {
                // Call OpenAI (This takes time!)
                String response = chatClient.prompt().user(event.prompt()).call().content();

                // Publish Result
                JobResult result = new JobResult(event.jobId(), response);
                streamBridge.send("ai-results-out-0", result);

            } catch (Exception e) {
                // Handle failures (DLQ, Retry)
                log.error("Job failed", e);
            }
        };
    }
}

Real-time Frontend Updates

The frontend can't wait on an HTTP connection. Instead, it should use Server-Sent Events (SSE) or WebSockets to listen for the completion event.

Security Considerations

Async architectures introduce complexity in identity propagation. When the worker processes the message, the original HTTP request context (and the user's JWT) is gone.

Async Security Checklist

Context Propagation: Pass the `userId` or `tenantId` in the Kafka message headers or payload. Never rely on ThreadLocal storage in async consumers.
Rate Limiting by Token Count: Standard request limits aren't enough; limit by total tokens processed to control costs and prevent DoS.
Poison Pills: Malformed messages that crash the consumer can halt the entire pipeline. Implement strict schema validation (Avro/Protobuf) and Dead Letter Queues (DLQ).
Result Delivery: When sending results back via WebSocket/SSE, verify that the connection belongs to the user who initiated the job to prevent cross-user data leaks.

Conclusion

For production AI applications, Event-Driven Architecture is often the only viable way to scale. Spring Cloud Stream makes it trivial to decouple your request ingestion from the heavy lifting of AI processing.

Written by the DevMetrix Team • Published December 12, 2025