feat: port jargonsdev/ai into ✨jAI module (#181)

babblebey · web-flow · commit 1a4190a410c6 · 2025-09-07T02:16:46.000+01:00
### Description  This pull request introduces the ✨jAI module, an AI-powered assistant for jargons.dev, and integrates Retrieval-Augmented Generation (RAG) capabilities into the platform. It ports the project from a standalone repo at https://github.com/jargonsdev/ai onto this project as a module. Adding its documentation, environment variables, and setup scripts to support semantic search and intelligent explanations of technical terms. This means that https://github.com/jargonsdev/ai has now been archive jargonsdev/ai@52d466e **AI Module Implementation** * Added the new `apps/jai` module, including core files for AI prompt templates (`jai-prompt.js`), OpenAI model configuration (`model.js`), message formatting utilities (`utils.js`), and Qdrant vector store integration (`vector-store.js`). The main export interface is provided via `index.js`. [[1]](diffhunk://#diff-5439cc07d5881e2e44a6e29b3126ebe43820b54254eca7a57282cf3bca01b122R1-R6) [[2]](diffhunk://#diff-82c094682b612a0966e5684b3ad170b5814750d54ff6d0c8be39f8aa6d2f52ecR1-R36) [[3]](diffhunk://#diff-f74c575d26128dd8394213eb0a7cdb8be7e0618fba3bfc1254ac1dbcf3c3942eR1-R16) [[4]](diffhunk://#diff-5a6af93501e3347528bfd0d4492bddc64e10a6292e5142850b17fabe9fbd6f54R1-R8) [[5]](diffhunk://#diff-cc61e12956cec9df2abbe8f9d48e4d8fc4bfb25212ab3087666271d4ad2b39f6R1-R24) * Introduced the vector store seeding script `dev/seed-vector-store.js` to fetch dictionary content, split it, and populate the Qdrant vector database for semantic search. **Documentation and Setup** * Added comprehensive documentation for ✨jAI in `apps/jai/README.md`, detailing its purpose, tech stack, module structure, setup, and integration points. * Updated `README.md` and `dev/README.md` to explain how to enable, seed, and use ✨jAI, including step-by-step instructions and technical details for the vector store seeding process. [[1]](diffhunk://#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R71-R86) [[2]](diffhunk://#diff-e14025c0fa40d4857e4b40fc96ea5ee995afe300014626b68cb55a479fa5b8fcR25-R100) **Configuration and Dependencies** * Updated `.env.example` to include required environment variables for OpenAI and Qdrant, making it easier to configure local and production environments for AI features. * Added new dependencies to `package.json` for LangChain, OpenAI, Qdrant, and supporting libraries, as well as npm scripts for seeding the vector store (`seed:jai`). [[1]](diffhunk://#diff-7ae45ad102eab3b6d7e7896acd08c427a9b25b346470d7bc6507b6481575d519L12-R20) [[2]](diffhunk://#diff-7ae45ad102eab3b6d7e7896acd08c427a9b25b346470d7bc6507b6481575d519R33-R43) [[3]](diffhunk://#diff-7ae45ad102eab3b6d7e7896acd08c427a9b25b346470d7bc6507b6481575d519R61) These changes collectively enable the jargons.dev platform to offer intelligent, AI-powered explanations and semantic search, with clear documentation and streamlined setup for developers. ### Related Issue  - #142 - jargonsdev/roadmap#5 - jargonsdev/roadmap#6 - jargonsdev/ai@52d466e ### Screenshots/Screencasts  NA ### Notes to Reviewer  Added new npm packages - langchain - @langchain/open-ai - @langchain/qdrant - node-fetch - ai
diff --git a/.env.example b/.env.example
@@ -8,5 +8,14 @@ GITHUB_OAUTH_APP_CLIENT_SECRET="72efcd4eac54cd7d53d9f6a8dcc20cd2c3a464cf"
 
 CRYPTO_SECRET_KEY="secret"
 
-PUBLIC_PROJECT_REPO="user/jargons.dev-test"
-PUBLIC_PROJECT_REPO_BRANCH_REF="refs/heads/main"
+PUBLIC_PROJECT_REPO="<user>/jargons.dev-test"
+PUBLIC_PROJECT_REPO_BRANCH_REF="refs/heads/main"
+
+# LLM and Embedding Model - Optional to run jAI locally - Get keys https://platform.openai.com
+OPENAI_API_KEY=sk-proj-*************************************
+OPENAI_CHAT_MODEL=gpt-4.1
+OPENAI_EMBEDDINGS_MODEL=text-embedding-3-small
+
+# Vector Store - Optional to run jAI locally - Get Keys https://qdrant.tech
+QDRANT_API_KEY=eyJhb*************************************
+QDRANT_URL=https://*************************.****.cloud.qdrant.io
diff --git a/.gitignore b/.gitignore
@@ -156,4 +156,7 @@ pnpm-debug.log*
 .idea/
 
 # Vercel build
-.vercel
+.vercel
+
+# dev/dictionary.json temporary files
+dev/dictionary.json
diff --git a/README.md b/README.md
@@ -68,6 +68,22 @@ To get set-up follow these steps:
 
 6. Open your browser and visit `http://localhost:4321` to view the project.
 
+## AI Features - Powering ✨jAI
+
+<tt>jargons.dev</tt> includes **✨jAI** (jargons.dev AI) - an intelligent assistant that helps users explore and understand software engineering terms through AI-powered interactions.
+
+### Setup ✨jAI
+
+To enable ✨jAI, you need to seed the vector store with dictionary content that ✨jAI uses for its knowledge base:
+
+```sh
+npm run seed:jai
+```
+
+This script prepares the vector store with processed dictionary content, enabling ✨jAI to provide intelligent responses and semantic search capabilities.
+
+**[Learn more about ✨jAI setup](./dev/README.md#seed-vector-store-script)** for detailed configuration and usage instructions.
+
 ## Testing
 
 <tt>jargons.dev</tt> implements comprehensive testing to ensure code quality and reliability.
diff --git a/apps/jai/README.md b/apps/jai/README.md
@@ -0,0 +1,169 @@
+<div align="center" style="margin-top: 12px">
+  <a href="https://www.jargons.dev">
+    <img width="300" alt="jargons.dev AI" src="https://github.com/user-attachments/assets/5459f7e3-2e23-43bf-b52b-2f198c1dd413">
+  </a>
+  <h1><tt>jargons.dev AI (jAI)</tt></h1>
+  <h3>The AI-Powered Assistant for jargons.dev</h3>
+</div>
+
+## About
+
+✨jAI is a Retrieval-Augmented Generation (RAG) application that integrates the `jargons.dev` dictionary as its core knowledge base. This module serves as the AI utilities layer for the main jargons.dev application, providing intelligent assistance and semantic search capabilities throughout the platform.
+
+Unlike standalone AI applications, ✨jAI is deeply integrated into the jargons.dev ecosystem, powering features like:
+
+- Intelligent word explanations and follow-up conversations
+- Semantic search across the dictionary
+- Context-aware responses based on the curated dictionary content
+- Real-time AI assistance for developers exploring technical terms
+
+## Tech Stack
+
+✨jAI is built using the following technologies:
+
+- [OpenAI API](https://openai.com/api/) - Platform for building AI experiences powered by industry-leading models and tools. Powers AI chat responses and generates embeddings for semantic search
+- [Qdrant](https://qdrant.tech/) - Vector database and similarity search engine for AI applications. Stores and searches vector embeddings of dictionary content for context retrieval
+- [LangChain](https://langchain.com/) - Framework for developing applications powered by large language models (LLMs)
+
+## Module Structure
+
+The ✨jAI module is organized into focused utility files:
+
+```
+apps/jai/
+├── index.js              # Main exports and module interface
+└── lib/
+    ├── jai-prompt.js      # AI personality and prompt templates
+    ├── model.js           # OpenAI model configuration
+    ├── utils.js           # Utility functions for message formatting
+    └── vector-store.js    # Qdrant vector store integration
+```
+
+### Core Components
+
+#### `index.js`
+
+Main module interface that exports all ✨jAI utilities:
+
+```javascript
+export { jAIPrompt, formatMessage, model, vectorStore };
+```
+
+#### `lib/jai-prompt.js`
+
+Defines ✨jAI's personality and conversation templates. The AI assistant is designed to:
+
+- Explain technical jargon clearly and concisely
+- Use relatable analogies and developer-friendly examples
+- Maintain a friendly, witty personality
+- Encourage follow-up questions and deeper exploration
+
+#### `lib/model.js`
+
+Configures the OpenAI ChatGPT model with optimized settings for technical explanations:
+
+- Streaming responses for real-time interaction
+- Temperature tuned for consistent, helpful responses
+- Token limits optimized for concise explanations
+
+#### `lib/vector-store.js`
+
+Manages the Qdrant vector database integration:
+
+- Semantic search across dictionary content
+- OpenAI embeddings for high-quality similarity matching
+- Production-ready vector store connection
+
+#### `lib/utils.js`
+
+Utility functions for message processing and formatting.
+
+## Environment Variables
+
+✨jAI requires the following environment variables:
+
+```bash
+# OpenAI Configuration
+OPENAI_API_KEY=your_openai_api_key
+OPENAI_CHAT_MODEL=gpt-4-turbo-preview  # or your preferred model
+OPENAI_EMBEDDINGS_MODEL=text-embedding-3-small
+
+# Qdrant Vector Database
+QDRANT_URL=your_qdrant_instance_url
+QDRANT_API_KEY=your_qdrant_api_key
+```
+
+## Setup and Usage
+
+### 1. Prerequisites
+
+Ensure you have the required environment variables configured in your `.env` file at the project root.
+
+### 2. Seed the Vector Store
+
+Before using ✨jAI, you need to populate the vector store with dictionary content:
+
+```bash
+npm run seed:jai
+```
+
+This command processes all dictionary entries and creates embeddings for semantic search.
+
+## Architecture Integration
+
+✨jAI is designed as a utility module that integrates seamlessly with the main jargons.dev application. The module is consumed in two primary areas:
+
+### 1. Vector Store Seeding (`dev/seed-vector-store.js`)
+
+Uses the `vectorStore` utility to populate the database with dictionary content. The script fetches dictionary entries from the jargons.dev API, processes them into document chunks, and creates vector embeddings for semantic search capabilities.
+
+### 2. API Endpoint (`src/pages/api/jai/follow-up-chat.js`)
+
+Imports all four core utilities (`jAIPrompt`, `model`, `formatMessage`, `vectorStore`) for real-time AI interactions. Powers the follow-up chat feature with semantic search for relevant context, conversation history management, and streaming AI response generation.
+
+### Integration Flow
+
+1. **Data Preparation**: `seed-vector-store.js` populates the vector database with dictionary content
+2. **Runtime Processing**: API endpoints use ✨jAI utilities for semantic search and AI response generation
+3. **Real-time Interaction**: Streaming responses provide immediate feedback to users
+4. **Context Awareness**: Vector search ensures AI responses are grounded in dictionary content
+
+## Development
+
+### Local Development
+
+✨jAI runs as part of the main jargons.dev development environment:
+
+```bash
+npm start  # Starts the development server with ✨jAI enabled
+```
+
+### Testing
+
+AI functionality is tested as part of the main project's test suite:
+
+```bash
+npm run test          # Run all tests including AI utilities
+npm run test:coverage # Generate coverage report
+```
+
+## Contributing
+
+Contributions to ✨jAI are welcome! Please refer to the main project's [Contribution Guide](../../CONTRIBUTING.md) for guidelines.
+
+When contributing to ✨jAI specifically:
+
+- Follow the modular structure for new utilities
+- Maintain the friendly, developer-focused AI personality
+- Test AI responses for accuracy and helpfulness
+- Document any new environment variables or setup steps
+
+## Support
+
+✨jAI is part of the open-source jargons.dev project. Do leave the project a star ⭐️
+
+For ✨jAI-specific issues or questions, please use the main project's issue tracker with the `✨jai` label.
+
+---
+
+**[Back to main jargons.dev project](../../README.md)**
diff --git a/apps/jai/index.js b/apps/jai/index.js
@@ -0,0 +1,6 @@
+import model from "./lib/model.js";
+import { formatMessage } from "./lib/utils.js";
+import { jAIPrompt } from "./lib/jai-prompt.js";
+import vectorStore from "./lib/vector-store.js";
+
+export { jAIPrompt, formatMessage, model, vectorStore };
diff --git a/apps/jai/lib/jai-prompt.js b/apps/jai/lib/jai-prompt.js
@@ -0,0 +1,36 @@
+import { PromptTemplate } from "@langchain/core/prompts";
+
+const TEMPLATE = `You are jAI, an AI-powered assistant for jargons.dev, a dictionary for developers and tech enthusiasts. 
+Your job is to explain technical jargon in a clear, concise, and engaging way. You have a friendly, slightly witty personality, 
+and you relate to developers by using analogies, code examples, and real-world comparisons.
+
+Your tone should be knowledgeable yet casual—think of yourself as a coding buddy who can break down complex terms without being overly technical.
+
+Follow these guidelines when responding:
+1. **Explain concisely**: Keep it short, clear, and to the point.
+2. **Use relatable analogies**: Compare tech concepts to real-world scenarios when possible.
+3. **Inject light humor**: A sprinkle of wit is welcome but keep it professional and helpful.
+4. **Encourage follow-up questions**: Suggest deeper dives where relevant.
+5. **Provide developer-centric examples**: Preferably in JavaScript, unless another language is more appropriate.
+6. **Vary your responses**: Avoid repetitive explanations—offer multiple phrasings when possible.
+7. **Use friendly but smart language**: Sound like an experienced dev friend, not a rigid encyclopedia.
+
+Examples of your style:
+- Instead of just saying "An API is a way for two systems to communicate," say:
+  _"An API is like a restaurant menu—you see what’s available and place an order. The kitchen (server) then prepares your dish (response). No peeking inside!"_
+- Instead of saying "Metadata is data about data," say:
+  _"Metadata is like a README file—it doesn’t change the code, but it tells you what’s inside."_
+- Instead of a generic error message, say:
+  _"Oops! Looks like I just ran out of memory. Try again?"_
+
+Now, answer the user's question based only on the following context. If the answer is not in the context, go ahead and provide an answer using your own knowledge; but lightly mention that the information was not available in the context.
+
+------------------------------
+Context: {context}
+------------------------------
+Current conversation: {chat_history}
+
+User: {question}
+jAI:`;
+
+export const jAIPrompt = PromptTemplate.fromTemplate(TEMPLATE);
diff --git a/apps/jai/lib/model.js b/apps/jai/lib/model.js
@@ -0,0 +1,16 @@
+import { ChatOpenAI } from "@langchain/openai";
+
+// Create the model
+const model = new ChatOpenAI({
+  apiKey: process.env.OPENAI_API_KEY || import.meta.env.OPENAI_API_KEY,
+  model: process.env.OPENAI_CHAT_MODEL || import.meta.env.OPENAI_CHAT_MODEL,
+  temperature: 0.2,
+  maxTokens: 1024,
+  topP: 0.95,
+  frequencyPenalty: 0,
+  presencePenalty: 0,
+  streaming: true,
+  verbose: process.env.NODE_ENV !== "production",
+});
+
+export { model as default };
diff --git a/apps/jai/lib/utils.js b/apps/jai/lib/utils.js
@@ -0,0 +1,8 @@
+/**
+ * Formats a message into a string
+ * @param {import("ai").Message} message The message to format
+ * @returns The formatted message
+ */
+export const formatMessage = (message) => {
+  return `${message.role}: ${message.content}`;
+};
diff --git a/apps/jai/lib/vector-store.js b/apps/jai/lib/vector-store.js
@@ -0,0 +1,24 @@
+/**
+ * @todo Nice to have: Setup vectorStore for local development and production, and use based on the environment
+ * @todo ...use `MemoryVectorStore` for local development and `QdrantVectorStore` for production
+ */
+
+import { OpenAIEmbeddings } from "@langchain/openai";
+import { QdrantVectorStore } from "@langchain/qdrant";
+
+// Initialize the OpenAI embeddings
+const embeddings = new OpenAIEmbeddings({
+  model:
+    process.env.OPENAI_EMBEDDINGS_MODEL ||
+    import.meta.env.OPENAI_EMBEDDINGS_MODEL,
+  apiKey: process.env.OPENAI_API_KEY || import.meta.env.OPENAI_API_KEY,
+});
+
+// Load vector store collection
+const vectorStore = await QdrantVectorStore.fromExistingCollection(embeddings, {
+  url: process.env.QDRANT_URL || import.meta.env.QDRANT_URL,
+  apiKey: process.env.QDRANT_API_KEY || import.meta.env.QDRANT_API_KEY,
+  collectionName: "dictionary",
+});
+
+export { vectorStore as default };
diff --git a/dev/README.md b/dev/README.md
@@ -22,6 +22,82 @@ This script streamlines the process of creating a GitHub App required to run jar
 
 This script simplifies the setup process for running <tt>jargons.dev</tt> locally and ensures that your GitHub App is configured correctly. If you encounter any issues during setup, please reach out or craeting an issue.
 
+## Seed Vector Store Script
+
+This script prepares the knowledge base for **✨jAI** (jargons.dev AI) by populating the vector store with dictionary content. jAI uses this processed data to provide intelligent responses and semantic understanding of software engineering terms.
+
+### When to Use
+
+Run this script when you need to:
+- Initialize ✨jAI's knowledge base for the first time
+- Update ✨jAI with the latest dictionary content
+- Rebuild ✨jAI's vector store after making changes to the AI system
+- Prepare ✨jAI for development or testing of AI-powered features
+
+### Prerequisites
+
+Before running this script, ensure you have:
+- All dependencies installed (`npm ci`)
+- OPENAI and QDRANT environment variables properly configured in your `.env` file
+- Network access to fetch from jargons.dev API
+- Sufficient disk space for temporary dictionary file
+
+### Usage
+
+```bash
+npm run seed:jai
+```
+
+### How It Works
+
+The script performs these steps to prepare ✨jAI's knowledge base:
+
+1. **Data Fetching**: Downloads the complete dictionary from `https://jargons.dev/api/v1/browse`
+2. **File Processing**: Saves data locally and loads it using LangChain's JSONLoader
+3. **Document Splitting**: Breaks content into optimally-sized chunks (1000 chars with 200 overlap)
+4. **Vector Store Population**: Adds processed documents to ✨jAI's vector store in batches of 100
+5. **Cleanup**: Removes temporary files and provides completion summary
+
+### Technical Implementation
+
+The script leverages several key technologies:
+
+- **LangChain JSONLoader**: Extracts title and content fields from dictionary entries
+- **RecursiveCharacterTextSplitter**: Intelligently splits text while preserving context
+- **Batch Processing**: Prevents memory issues and provides progress feedback
+- **File System Operations**: Handles temporary file creation and cleanup
+
+### Configuration Options
+
+Key parameters that can be adjusted:
+
+- **Chunk Size**: Currently 1000 characters (optimal for most search queries)
+- **Chunk Overlap**: 200 characters (ensures context preservation)
+- **Batch Size**: 100 documents per batch (balances performance and memory usage)
+
+### Error Handling
+
+The script includes robust error handling for:
+- Network connectivity issues during API calls
+- File system errors during temporary file operations
+- Vector store connection problems
+- Memory management during large batch processing
+
+### Example Output
+
+```
+Saved the dictionary file to /path/to/dev/dictionary.json
+Loaded 500 documents
+Split 1250 documents
+Added batch 1 of 13 (100 documents) to the vector store
+Added batch 2 of 13 (100 documents) to the vector store
+...
+Added 1250 splits to the vector store
+Cleaned up the dictionary file at /path/to/dev/dictionary.json
+```
+
+Once completed, ✨jAI will have access to the processed dictionary content and can provide intelligent responses about software engineering terms.
+
 ## Format-Staged Script
 
 This script provides a cross-platform solution for formatting only the files that are staged in Git, making it perfect for pre-commit workflows without requiring external dependencies like Husky or lint-staged.
diff --git a/dev/seed-vector-store.js b/dev/seed-vector-store.js
diff --git a/package-lock.json b/package-lock.json
diff --git a/package.json b/package.json