feat(dotAI): Dot AI LangChain4J integration (First provider/OpenAI)#35150
feat(dotAI): Dot AI LangChain4J integration (First provider/OpenAI)#35150ihoffmann-dot wants to merge 52 commits intomainfrom
Conversation
|
Claude finished @ihoffmann-dot's task in 2m 56s —— View job PR Review
The LangChain4J abstraction layer is well-structured and the cache-key strategy (host + SHA-256 of providerConfig) is sound. The Bugs1. If the LangChain4J streaming model hangs (network drop without proper error callback, provider bug), // Current — unbounded wait
latch.await();
// Fix — add a timeout matching the model's configured timeout (+ headroom)
if (!latch.await(120, TimeUnit.SECONDS)) {
throw new DotAIClientConnectException("Streaming timed out after 120s");
}2. The non-streaming builder uses // Non-streaming — correct
applyCommonConfig(config, builder::baseUrl, builder::maxRetries, builder::timeout);
// Streaming — missing maxRetries
if (config.endpoint() != null) builder.baseUrl(config.endpoint());
if (config.timeout() != null) builder.timeout(Duration.ofSeconds(config.timeout()));
// config.maxRetries() is never applied3.
// summarizeFromContent — resolvedForm is computed but never used
final CompletionsForm resolvedForm = resolveForm(request, response, formIn);
return getResponse(
request, response,
formIn, // ← passes formIn, not resolvedForm
...4. " model=" + model + "',\n" + // ← produces: model=AIModel{...}',The trailing Design Issues5.
6. Credential redaction is a hardcoded field-name allowlist ( private static final Set<String> CREDENTIAL_FIELDS = Set.of("apiKey", "secretAccessKey", "accessKeyId");Future providers (Vertex AI service account tokens, GCP application credentials, etc.) will silently leak credentials in the 7. private static final ObjectMapper MAPPER = new ObjectMapper();The rest of the codebase uses 8. Double sanitization of
Informational
|
|
Pull Request Unsafe to Rollback!!!
|
| host = new SiteDataGen().nextPersisted(); | ||
| AiTest.aiAppSecretsWithProviderConfig( | ||
| host, | ||
| AiTest.providerConfigJson(AiTest.PORT, "gpt-4o-mini")); |
There was a problem hiding this comment.
will this aiAppSecretsWithProviderConfig work when you implement new providers? will you have to call it for each provider?
There was a problem hiding this comment.
Yes. This is just a test with an OpenAI model, but aiAppSecretsWithProviderConfig just saves a JSON string and doesn’t know about the provider. It will work with any model. To test other providers we only need to pass a different JSON in the providerConfigJson param.
There was a problem hiding this comment.
I mean, this test was implemented to work with an OpenAI model. When you implement the other providers and want to include new tests for them, you will have to refactor it to make it scalable; otherwise, your test class will contain duplicate logic, become huge, and /or difficult to follow
dotCMS/src/main/java/com/dotcms/ai/client/langchain4j/LangChain4jModelFactory.java
Outdated
Show resolved
Hide resolved
dotcms-integration/src/test/java/com/dotcms/ai/client/AIProxyClientTest.java
Show resolved
Hide resolved
| final JSONObjectAIRequest request = textRequest( | ||
| invalidModel, | ||
| "gpt-4o-mini", |
There was a problem hiding this comment.
We really should be walking away from all those JSONObject; they are very much error-prone
The modern Java stack should prefer immutables bound with Jackson. But this is "fine" as this code was already like that. But we should probably modernize this API
dotcms-integration/src/test/java/com/dotcms/ai/client/AIProxyClientTest.java
Show resolved
Hide resolved
| if (config == null || config.getProvider() == null) { | ||
| return null; | ||
| } | ||
| switch (config.getProvider().toLowerCase()) { |
There was a problem hiding this comment.
Too few statements for a switch, don't you think ??
unless this is expected to grow
There was a problem hiding this comment.
It will grow in a few days, Fab. Ivan is adding one provider per PR to avoid huge PRs
dotCMS/src/main/java/com/dotcms/ai/client/langchain4j/LangChain4jModelFactory.java
Outdated
Show resolved
Hide resolved
| @JsonIgnoreProperties(ignoreUnknown = true) | ||
| public class ProviderConfig { | ||
|
|
||
| private String provider; |
There was a problem hiding this comment.
See how we use immutables e.g.
com/dotcms/content/model/Contentlet.java
Not sure, but perhaps at this point we could even use Java records
There was a problem hiding this comment.
Thanks, I decided using Immutables to respect the codebase
dotCMS/src/main/java/com/dotcms/ai/client/langchain4j/ProviderConfig.java
Outdated
Show resolved
Hide resolved
|
Pull Request Unsafe to Rollback!!!
|
|
You have used all of your free Bugbot PR reviews. To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial. |
|
Pull Request Unsafe to Rollback!!!
|
|
You have used all of your free Bugbot PR reviews. To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial. |
|
Pull Request Unsafe to Rollback!!!
|
|
You have used all of your free Bugbot PR reviews. To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial. |
|
Pull Request Unsafe to Rollback!!!
|
|
You have used all of your free Bugbot PR reviews. To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial. |
|
Pull Request Unsafe to Rollback!!!
|
…led, NPE, mapper)
|
Pull Request Unsafe to Rollback!!!
|
…oid static init issues
|
Pull Request Unsafe to Rollback!!!
|
…s receive clean JSON
a6bc5c6 to
4aaa0fd
Compare
|
Pull Request Unsafe to Rollback!!!
|
Summary
Replaces the direct OpenAI HTTP client (
OpenAIClient) with a LangChain4J abstraction layer, enabling multi-provider AI support in dotCMS. This PR covers Phase 1: OpenAI via LangChain4J.Changes
LangChain4jAIClient.java: NewAIProxiedClientimplementation that delegates chat, embeddings, and image requests to LangChain4J models.LangChain4jModelFactory.java: Factory that buildsChatModel,EmbeddingModel, andImageModelinstances from aProviderConfig. Only place with provider-specific builder logic.ProviderConfig.java: Deserializable POJO for theproviderConfigJSON secret (per provider section: model, apiKey, endpoint, maxTokens, maxCompletionTokens, etc.).AppConfig.java: Replaced legacy individual-field secrets (apiKey, model, etc.) with a singleproviderConfigJSON string.isEnabled()now only checks this field.AIAppValidator.java: Removed the OpenAI/v1/modelsvalidation call, which is incompatible with the multi-provider architecture.CompletionsResource.java: Updated/api/v1/ai/completions/configto derive model names and config values fromAppConfiggetters instead of iterating rawAppKeys.dotAI.yml: Removed legacy hidden fields; addedproviderConfigas the single configuration entry point.ProviderConfig,LangChain4jModelFactory, andLangChain4jAIClient; updatedAIProxyClientTestintegration test to useproviderConfig-based setup.Motivation
The previous implementation was tightly coupled to OpenAI's API contract (hardcoded HTTP calls, OpenAI-specific parameters, model validation via
/v1/models). LangChain4J provides a provider-agnostic model interface, allowing future phases to add Azure OpenAI, AWS Bedrock, and Vertex AI without touching the core request/response flow.The
providerConfigJSON secret replaces multiple individual secrets with a single structured configuration, supporting per-section (chat/embeddings/image) provider and model settings.Related Issue
This PR fixes #35183
EPIC: dotAI Multi-Provider Support #33970
Note
High Risk
High risk because it replaces the core AI client/provider path (OpenAI HTTP + model fallback/validation) with a new LangChain4J-backed implementation and a new
providerConfigsecret format, impacting chat, embeddings, and image generation behavior and configuration compatibility.Overview
dotAI now routes chat, embeddings, and image requests through a new LangChain4J-backed client (
LangChain4jAIClient) and sets LangChain4J as the default provider, replacing the direct OpenAI HTTP client and removing the model-fallback strategy.Configuration is migrated from many per-field secrets to a single
providerConfigJSON (with hashing + per-host model caching), updatingAppConfig.isEnabled(),dotAI.yml, and the/v1/ai/completions/configoutput (including credential redaction). Several OpenAI-specific model management/validation classes and tests are removed, and integration/unit tests are updated/added for the new providerConfig + LangChain4J flow.Embeddings/image handling is adjusted: embeddings requests now send raw text (with token-count fallback when encoding is unavailable), the async thread pool key is renamed to
AIThreadPool, max-token resolution is made more resilient, and image temp-file creation now supports base64 (b64_json) responses.Reviewed by Cursor Bugbot for commit 31cb86e. Bugbot is set up for automated code reviews on this repo. Configure here.